How to use sscanf correctly and safely - c

First of all, other questions about usage of sscanf do not answer my question because the common answer is to not use sscanf at all and use fgets or getch instead, which is impossible in my case.
The problem is my C professor wants me to use scanf in a program. It's a requirement.
However the program also must handle all the incorrect input.
The program must read an array of integers. It doesn't matter in what format the integers
for the array are supplied. To make the task easier, the program might first read the size of the array and then the integers each in a new line.
The program must handle the inputs like these (and report errors appropriately):
999999999999999...9 (numbers larger than integer)
12a3 (don't read this as an integer 12)
a...z (strings)
11 aa 22 33\n all in one line (this might be handled by discarding everything after 11)
inputs larger than the input array
There might be more incorrect cases, these are the only few I could think of.
If the erroneous input is supplied, the program must ask the user to input again until
the correct input is given, but the previous correct input must be kept (only incorrect
input must be cleared from the input stream).
Everything must conform to C99 standard.

The scanf family of function cannot be used safely, especially when dealing with integers. The first case you mentioned is particularly troublesome. The standard says this:
If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined.
Plain and simple. You might think of %5d tricks and such but you'll find they're not reliable. Or maybe someone will think of errno. The scanf functions aren't required to set errno.
Follow this fun little page: they end up ditching scanf altogether.
So go back to your C professor and ask them: how exactly does C99 mandate that sscanf will report errors ?

Well, let sscanf accept all inputs as %s (i.e. strings) and then program analyze them

If you must use scanf to accept the input, I think you start with something a bit like the following.
int array[MAX];
int i, n;
scanf("%d", &n);
for (i = 0; i < n && !feof(stdin); i++) {
scanf("%d", &array[i]);
}
This will handle (more or less) the free-format input problem since scanf will automatically skip leading whitespace when matching a %d format.
The key observation for many of the rest of your concerns is that scanf tells you how many format codes it parsed successfully. So,
int matches = scanf("%d", &array[i]);
if (matches == 0) {
/* no integer in the input stream */
}
I think this handles directly concerns (3) and (4)
By itself, this doesn't quite handle the case of the input12a3. The first time through the loop, scanf would parse '12as an integer 12, leaving the remaininga3` for the next loop. You would get an error the next time round, though. Is that good enough for your professor's purposes?
For integers larger than maxint, eg, "999999.......999", I'm not sure what you can do with straight scanf.
For inputs larger than the input array, this isn't a scanf problem per se. You just need to count how many integers you've parsed so far.
If you're allowed to use sscanf to decode strings after they've been extracted from the input stream by something like scanf("%s") you could also do something like this:
while (...) {
scanf("%s", buf);
/* use strtol or sscanf if you really have to */
}
This works for any sequence of white-space separated words, and lets you separate scanning the input for words, and then seeing if those words look like numbers or not. And, if you have to, you can use scanf variants for each part.

The problem is my C professor wants me to use scanf in a program.
It's a requirement.
However the program also must handle all the incorrect input.
This is an old question, so the OP is not in that professor's class any more (and hopefully the professor is retired), but for the record, this is a fundamentally misguided and basically impossible requirement.
Experience has shown that when it comes to interactive user input, scanf is suitable only for quick-and-dirty situations when the input can be assumed to correct.
If you want to read an integer (or a floating-point number, or a simple string) quickly and easily, then scanf is a nice tool for the job. However, its ability to gracefully handle incorrect input is basically nonexistent.
If you want to read input robustly, reliably detecting incorrect input, and perhaps warning the user and asking them to try again, scanf is simply not the right tool for the job. It's like trying to drive screws with a hammer.
See this answer for some guidelines for using scanf safely in those quick-and-dirty situations. See this question for suggestions on how to do robust input using something other than scanf.

scanf("%s", string) into long int_string = strtol(string, &end_pointer, base:10)

Related

why does this part of the program skip the second input?

On entering the first inputs this part of my program terminates.
float p_x,p_y,q_x,q_y,r_x,r_y;
printf("Note :- coordinates are entered as follows (x,y) \n");
printf("Enter the coordinates of 'p' :");
scanf("(%f,%f)",&p_x,&p_y);
printf("Enter the coordinates of 'q' :");
scanf("(%f,%f)",&q_x,&q_y);
Short answer: Change the second scanf call to
scanf(" (%f,%f)", &q_x, &q_y);
Note the extra space in the format statement.
Longer explanation: One of scanf's quirks is that it only reads what it needs, and leaves everything else sitting on the input stream for next time. So when you read p_x and p_y, scanf reads the (, the value for p_x, the ,, the value for p_y, and the ), but it doesn't do anything with the invisible final \n on the line, the "newline" character that's there as a result of the Enter key you typed.
So, later, when you try to read q_x and q_y, the first thing scanf wants to see is the ( you said should be there, but the first thing scanf actually sees is that \n character left over from last time. So the second scanf call fails, reading nothing.
In a scanf format statement, a space character means "read and ignore whitespace". So if you change the second format string to " (%f,%f)", scanf will read and ignore the stray newline, and then it will be able to read the input you asked for.
The reason you don't have this problem all the time is that most (though not all) of the scanf format characters automatically skip leading whitespace as a part of doing their work. If you had, say, a simpler call to
scanf("%f", &q_x);
followed by a later call to
scanf("%f", &q_y);
it would work just fine, without any extra explicit space characters in the format strings.
General advice: You may be thinking that this is a pretty lame situation. You may have gotten the impression that scanf was supposed to be a nice, simple way to do input in your C programs, and you may be thinking, "But this is not simple! How was I supposed to know to write " (%f,%f)"?" And I would absolutely agree with you: This is a pretty lame situation!
The fact is that scanf only seems to be a nice, simple way to do input. It's actually a terribly complicated, unforgiving, nearly useless mess. It's only simple if you're reading very simple input, such as single numbers or simple strings (not containing spaces), or maybe single characters. For anything more complicated than that, there are so many quirks and foibles and exceptions and special cases that it almost never works the first time, and it's often more trouble than it's worth to even try to get it working.
So the general advice is to only try to use scanf for very simple input, during the first few weeks of your C programming career. You can read what I mean by "very simple input" in this answer to a previous question. Then, once you've gotten a few skills under your belt, or when you need to do something a little more complicated, its time to learn how to do input using better techniques than scanf.
Check return value to identify problems.
// scanf("(%f,%f)",&q_x,&q_y);
if (scanf("(%f,%f)",&q_x,&q_y) != 2) {
puts("Bad input");
}
Tolerate and consume spaces, new lines, tabs, etc. between portion of input. OP did not do this. The '\n' of the first entry was not consumed and caused the 2nd scanf() to fail. "%f" already allows optional leading white-spaces. Add a " " before the fixed formats characters: (,) so optional white-spaces will get consumed there too.
// v---v---v add spaces to format to allow any white-space in input.
if (scanf(" (%f ,%f )",&q_x,&q_y) != 2) {
puts("Bad input");
}

What does scanf("%f%c", ...) do against input `100e`?

Consider the following C code (online available io.c):
#include <stdio.h>
int main () {
float f;
char c;
scanf ("%f%c", &f, &c);
printf ("%f \t %c", f, c);
return 0;
}
When the input is 100f, it outputs 100.000000 f.
However, when the input is 100e, it outputs only 100.000000, without e followed. What is going on here? Isn't 100e an invalid floating-point number?
This is (arguably) a glibc bug.
This behaviour clearly goes against the standard. However it is exhibited by other implementations. Some people consider it a bug in the standard instead.
Per the standard, An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. So 100e is an input item because it is a prefix of a matching input sequence, say, 100e1, but any longer sequence of characters from the input isn't. Further, If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. 100e is not a matching sequence so the standard requires the directive to fail.
The standard cannot tell scanf to accept 100 and continue scanning from e, as some people would expect, because stdio has a limited push-back of just one character. So having read 100e, the implementation would have to read at least one more character, say a newline to be specific, and then push back both newline and e, which it cannot always do.
I'd say this is pretty clearly a pretty unclear, gray area.
If you're an implementor of a C library (or a member of the X3J11 committee), you have to worry about this sort of thing — sometimes a lot. You have to worry about the edge cases, and sometimes the edge cases can be particularly edgy.
However, you did not tag your question with the "language lawyer" tag, so perhaps you're not worried about a scrupulously correct, official interpretation.
If you're not an implementor of a C library or a member of the X3J11 committee, I'd say: don't worry what the "right" answer here is! You don't have to worry, because you don't care, because you'd be crazy to write code which is sensitive to this question — precisely because it's such an obvious gray area. (Even if you do figure out what the right behavior here is, do you trust every implementor of every C library in the world to always implement that behavior?)
I'd say there are three things you can do in the category of "not worrying", and not writing code which is sensitive to this question.
Don't use scanf at all (for anything). It's an odious, imprecise, imperfect function, that's not good for anything except — perhaps — getting numbers into the first few programs you ever write while you're first learning C. After that, scanf has no use in any serious program.
Don't arrange your code and data such that it has to confront ambiguous input like "100e" in the first place. Where is it coming from, anyway? Is it input the user might type? Data being read in from a data file? Is it expected or unexpected, correct or incorrect input? If you're reading a data file, do you have control over the code that writes the data file? Can you guarantee that floating-point fields will always be delimited appropriately, will not occasionally have random alphabetic characters appended?
If you do have to parse input that might contain a valid floating-point number, might have random alphabetic characters appended, and might therefore be ambiguous like this, I'd encourage you to use strtod instead, which is likely to be both better-defined and better-implemented.
Give a space between "%f %c" like that and also when you are going to enter input make sure to have a space between two inputs.
I am assuming you just want to print a character.
From the C Standard (6.4.4.2 Floating constants)
decimal-floating-constant:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
and
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
If you will change the call of printf the following way
printf ("%e \t %d\n", f, c);
you will get the output
1.000000e+02 10
that is the variable c has gotten the new line character '\n'.
It seems that the implementation of scanf is made such a way that the symbol e is interpreted as a part of a floating number though there is no digit after the symbol.
According to the C Standard (7.21.6.2 The fscanf function)
9 An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the longest
sequence of input characters which does not exceed any specified
field width and which is, or is a prefix of, a matching input
sequence.278) The first character, if any, after the input item
remains unread.
So 100e is a matching input sequence of characters for a floating number.

scanf what remains on the input stream after failer

I learned C a few years ago using K&R 2nd edition ANSI C. I’ve been reviewing my notes, while I’m learning more modern C from 2 other books.
I notice that K&R never use scanf in the book, except on the one section where they introduce it. They mainly use a getline function which they write in the book, which they change later in the book, once they introduce pointers. There getline is different then gcc getline, which caused me some problems until i change the name of getline to ggetline.
Reviewing my notes i found this quote:
This simplification is convenient and superficially attractive, and it
works, as far as it goes. The problem is that scanf does not work well
in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline. Worse, it turns out that
scanf's error handling is inadequate for many purposes. It tells you
whether a conversion succeeded or not (more precisely, it tells you
how many conversions succeeded), but it doesn't tell you anything more
than that (unless you ask very carefully). Like atoi and atof, scanf
stops reading characters when it's processing a %d or %f input and it
finds a non-numeric character. Suppose you've prompted the user to
enter a number, and the user accidentally types the letter 'x'. scanf
might return 0, indicating that it couldn't convert a number, but the
unconvertable text (the 'x') remains on the input stream unless you
figure out some other way to remove it.
For these reasons (and several others, which I won't bother to
mention) it's generally recommended that scanf not be used for
unstructured input such as user prompts. It's much better to read
entire lines with something like getline (as we've been doing all
along) and then process the line somehow. If the line is supposed to
be a single number, you can use atoi or atof to convert it. If the
line has more complicated structure, you can use sscanf (which we'll
meet in a minute) to parse it. (It's better to use sscanf than scanf
because when sscanf fails, you have complete control over what you do
next. When scanf fails, on the other hand, you're at the mercy of
where in the input stream it has left you.)
At first i thought this quote was from K&R, but i cannot find it in the book. Then i realized thats it's from lecture notes i got online, for someone who taught a course years ago using K&R book.
lecture notes
I know that K&R book is 30 years old now, so it is dated is some ways.
This quote is very old, so i was wondering if scanf still has this behavior or has it changed?
Does scanf still leave stuff in the input stream when it fails? for example above:
Suppose you've prompted the user to enter a number, and the user
accidentally types the letter 'x'. scanf might return 0, indicating
that it couldn't convert a number, but the unconvertable text (the
'x') remains on the input stream.
Is the following still true?
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline.
Has scanf changed much since the quotes above were written? Or are they still true today?
The reason i ask, in the newer books i am reading, no one mentions these issues.
scanf() is evil - use fgets() and then parse.
The detail is not that scanf() is completely bad.
1) The format specifiers are often used in a weak manner
char buf[100];
scanf("%s", buf); // bad - no width limit
2) The return value is errantly not checked
scanf("%99[\n]", buf); // what if use entered `"\n"`?
puts(buf);
3) When input is not as expected, it is not clear what remains in stdin.
if (scanf("%d %d %d", &i, &j, &k) != 3) {
// OK, not what is in `stdin`?
}
you can have baffling problems if you try to intermix calls to scanf with calls to getchar or getline.
Yes. Many scanf() calls leave a trailing '\n' in stdin that are then read as an empty line by getline(), fgets(). scanf() is not for reading lines. getline() and fgets() are much better suited to read a line.
Has scanf changed much since the quotes above were written?
Only so much change can happen without messing up the code base. #Jonathan Leffler
scanf() remains troublesome. scanf() is unable to accept an argument (after the format) to indicate how many characters to accept into a char * destination.
Some systems have added additional format options to help.
A fundamental issues is this:
User input is evil. It is more robust to get the text input as one step, qualify input, then parse and assess its success than trying to do all this in one function.
Security
The weakness of scanf() and coder tendency to code scanf() poorly has been a gold mine for hackers.
IMO, C lacks a robust user input function set.

fscanf usage in C

I have a file like this:
10 15
something
I want to read this into tree variables, let's say number1, number2, and mystring. I have doubts about what kind of pattern to give to fscanf. I am thinking something like this;
fscanf(fp,"%i %i\n%s",number1,number2,mystring);
Should this work, and also, is this the correct way of reading this file? If not, what would you suggest?
fscanf(fp,"%i %i\n%s",&number1,&number2,mystring);
fscanf takes pointers.
Read each line with fgets (or getline if you have it), split up the line with strsep (better, if available) or strtok_r (more awkward API but more portable), and then use strtoul to convert strings to numbers as necessary.
*scanf should never be used, because:
Some format strings (e.g. a bare "%s") are just as eager to overflow your buffers as gets is.
Behavior on integer overflow is undefined -- invalid input can potentially crash your program.
They do not report the character position of the first scan error, making it nigh-impossible to recover from a parse error. (This can be somewhat mitigated by using fgets and then sscanf instead of fscanf.)
Besides the problem with pointers, generally using spaces in the scanf format is a mistake -- in most cases scanf skips whitespace automatically. So I would use something like:
int number1, number2;
char mystring[32];
fscanf("%i%i%31s", &number1, &number2, &mystring)
This will read two numbers followed by a string of up to 31 non-whitespace characters, all separated by any whitespace. Note that "whitespace" includes spaces, tabs, and newlines, so it doesn't matter if its all on one line, or spread out over 3 lines or anything in between.
Note also using a limit on the size of the string -- without that, the input might overflow any fixed size buffer you provide (and there's no way to provide a variable sized buffer with scanf)

Using scanf in a while loop

Probably an extremely simple answer to this extremely simple question:
I'm reading "C Primer Plus" by Pratta and he keeps using the example
while (scanf("%d", &num) == 1)...
Is the == 1 really necessary? It seems like one could just write:
while (scanf("%d", &num))
It seems like the equality test is unnecessary since scanf returns the number of objects read and 1 would make the while loop true. Is the reason to make sure that the number of elements read is exactly 1 or is this totally superfluous?
In C, 0 is evaluated to false and everything else to true. Thus, if scanf returned EOF, which is a negative value, the loop would evaluate to true, which is not what you'd want.
Since scanf returns the value EOF (which is -1) on end of file, the loop as written is correct. It runs as long as the input contains text that matches %d, and stops either at the first non-match or end of file.
It would have been clearer at a glance if scanf were expecting more than one input....
while (scanf("%d %d", &x, &y)==2) { ... }
would exit the loop when the first time it was unable to match two values, either due to end of file end of file (scanf returns EOF (which is -1)) or on input matching error (e.g. the input xyzzy 42 does not match %d %d so scanf stops on the first failure and returns 0 without writing to either x or y) when it returns some value less than 2.
Of course, scanf is not your friend when parsing real input from normal humans. There are many pitfalls in its handling of error cases.
Edit: Corrected an error: scanf returns EOF on end of file, or a non-negative integer counting the number of variables it successfully set.
The key point is that since any non-zero value is TRUE in C, failing to test the return value correctly in a loop like this can easily lead to unexpected behavior. In particular, while(scanf(...)) is an infinite loop unless it encounters input text that cannot be converted according to its format.
And I cannot emphasize strongly enough that scanf is not your friend. A combination of fgets and sscanf might be enough for some simple parsing, but even then it is easily overwhelmed by edge cases and errors.
You understood the C code correctly.
Sometimes the reason for testing the number of items read is that someone wants to make sure that all items were read instead of scanf quitting early when it the input didn't match the expected type. In this particular case it didn't matter.
Usually scanf is a poor choice of functions because it doesn't meet the needs of interactive input from a human user. Usually a combination of fgets and sscanf yield better results. In this particular case it didn't matter.
If later chapters explain why some kinds of coding practices are better than this trivial example, good. But if not, you should dump the book you're reading.
On the other hand, your substitute code isn't exactly a substitute. If scanf returns -1 then your while loop will execute.
While you are correct it is not strictly necessary, some people prefer it for several reasons.
First, by comparing to 1 it becomes an explicit boolean value (true or false). Without the comparison, you are testing on an integer, which is valid in C, but not in later languages (like C#).
Secondly, some people would read the second version in terms of while([function]), instead of while([return value]), and be momentarily confused by testing a function, when what is clearly meant is testing the return value.
This can be completely a matter of personal preference, and as far as I'm concerned, both are valid.
One probably could write it without an explicit comparison (see the JRL's answer though), but why would one? I'd say that comparison-less conditions should only be used with values that have explicitly boolean semantics (like an isdigit() call, for example). Everything else should use an explicit comparison. In this case (the result of scanf) the semantics is pronouncedly non-boolean, so the explicit comparison is in order.
Also, the comparison one can usually omit is normally a comparison with zero. When you feel the urge to omit the comparison with something else (like 1 in this case) it is better to think twice and make sure you know what your are doing (see the JRL's answer again).
In any case, when the comparison can be safely omitted and you actually omit it, the actual semantical meaning of the condition remains the same. It has absolutely no impact on the efficiency of the resultant code, if that's something you are worrying about.

Resources