What does scanf("%f%c", ...) do against input `100e`? - c

Consider the following C code (online available io.c):
#include <stdio.h>
int main () {
float f;
char c;
scanf ("%f%c", &f, &c);
printf ("%f \t %c", f, c);
return 0;
}
When the input is 100f, it outputs 100.000000 f.
However, when the input is 100e, it outputs only 100.000000, without e followed. What is going on here? Isn't 100e an invalid floating-point number?

This is (arguably) a glibc bug.
This behaviour clearly goes against the standard. However it is exhibited by other implementations. Some people consider it a bug in the standard instead.
Per the standard, An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence. So 100e is an input item because it is a prefix of a matching input sequence, say, 100e1, but any longer sequence of characters from the input isn't. Further, If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. 100e is not a matching sequence so the standard requires the directive to fail.
The standard cannot tell scanf to accept 100 and continue scanning from e, as some people would expect, because stdio has a limited push-back of just one character. So having read 100e, the implementation would have to read at least one more character, say a newline to be specific, and then push back both newline and e, which it cannot always do.

I'd say this is pretty clearly a pretty unclear, gray area.
If you're an implementor of a C library (or a member of the X3J11 committee), you have to worry about this sort of thing — sometimes a lot. You have to worry about the edge cases, and sometimes the edge cases can be particularly edgy.
However, you did not tag your question with the "language lawyer" tag, so perhaps you're not worried about a scrupulously correct, official interpretation.
If you're not an implementor of a C library or a member of the X3J11 committee, I'd say: don't worry what the "right" answer here is! You don't have to worry, because you don't care, because you'd be crazy to write code which is sensitive to this question — precisely because it's such an obvious gray area. (Even if you do figure out what the right behavior here is, do you trust every implementor of every C library in the world to always implement that behavior?)
I'd say there are three things you can do in the category of "not worrying", and not writing code which is sensitive to this question.
Don't use scanf at all (for anything). It's an odious, imprecise, imperfect function, that's not good for anything except — perhaps — getting numbers into the first few programs you ever write while you're first learning C. After that, scanf has no use in any serious program.
Don't arrange your code and data such that it has to confront ambiguous input like "100e" in the first place. Where is it coming from, anyway? Is it input the user might type? Data being read in from a data file? Is it expected or unexpected, correct or incorrect input? If you're reading a data file, do you have control over the code that writes the data file? Can you guarantee that floating-point fields will always be delimited appropriately, will not occasionally have random alphabetic characters appended?
If you do have to parse input that might contain a valid floating-point number, might have random alphabetic characters appended, and might therefore be ambiguous like this, I'd encourage you to use strtod instead, which is likely to be both better-defined and better-implemented.

Give a space between "%f %c" like that and also when you are going to enter input make sure to have a space between two inputs.
I am assuming you just want to print a character.

From the C Standard (6.4.4.2 Floating constants)
decimal-floating-constant:
fractional-constant exponent-partopt floating-suffixopt
digit-sequence exponent-part floating-suffixopt
and
exponent-part:
e signopt digit-sequence
E signopt digit-sequence
If you will change the call of printf the following way
printf ("%e \t %d\n", f, c);
you will get the output
1.000000e+02 10
that is the variable c has gotten the new line character '\n'.
It seems that the implementation of scanf is made such a way that the symbol e is interpreted as a part of a floating number though there is no digit after the symbol.
According to the C Standard (7.21.6.2 The fscanf function)
9 An input item is read from the stream, unless the specification
includes an n specifier. An input item is defined as the longest
sequence of input characters which does not exceed any specified
field width and which is, or is a prefix of, a matching input
sequence.278) The first character, if any, after the input item
remains unread.
So 100e is a matching input sequence of characters for a floating number.

Related

scanf what remains on the input stream after failer

I learned C a few years ago using K&R 2nd edition ANSI C. I’ve been reviewing my notes, while I’m learning more modern C from 2 other books.
I notice that K&R never use scanf in the book, except on the one section where they introduce it. They mainly use a getline function which they write in the book, which they change later in the book, once they introduce pointers. There getline is different then gcc getline, which caused me some problems until i change the name of getline to ggetline.
Reviewing my notes i found this quote:
This simplification is convenient and superficially attractive, and it
works, as far as it goes. The problem is that scanf does not work well
in more complicated situations. In section 7.1, we said that calls to
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline. Worse, it turns out that
scanf's error handling is inadequate for many purposes. It tells you
whether a conversion succeeded or not (more precisely, it tells you
how many conversions succeeded), but it doesn't tell you anything more
than that (unless you ask very carefully). Like atoi and atof, scanf
stops reading characters when it's processing a %d or %f input and it
finds a non-numeric character. Suppose you've prompted the user to
enter a number, and the user accidentally types the letter 'x'. scanf
might return 0, indicating that it couldn't convert a number, but the
unconvertable text (the 'x') remains on the input stream unless you
figure out some other way to remove it.
For these reasons (and several others, which I won't bother to
mention) it's generally recommended that scanf not be used for
unstructured input such as user prompts. It's much better to read
entire lines with something like getline (as we've been doing all
along) and then process the line somehow. If the line is supposed to
be a single number, you can use atoi or atof to convert it. If the
line has more complicated structure, you can use sscanf (which we'll
meet in a minute) to parse it. (It's better to use sscanf than scanf
because when sscanf fails, you have complete control over what you do
next. When scanf fails, on the other hand, you're at the mercy of
where in the input stream it has left you.)
At first i thought this quote was from K&R, but i cannot find it in the book. Then i realized thats it's from lecture notes i got online, for someone who taught a course years ago using K&R book.
lecture notes
I know that K&R book is 30 years old now, so it is dated is some ways.
This quote is very old, so i was wondering if scanf still has this behavior or has it changed?
Does scanf still leave stuff in the input stream when it fails? for example above:
Suppose you've prompted the user to enter a number, and the user
accidentally types the letter 'x'. scanf might return 0, indicating
that it couldn't convert a number, but the unconvertable text (the
'x') remains on the input stream.
Is the following still true?
putchar and printf could be interleaved. The same is not always true
of scanf: you can have baffling problems if you try to intermix calls
to scanf with calls to getchar or getline.
Has scanf changed much since the quotes above were written? Or are they still true today?
The reason i ask, in the newer books i am reading, no one mentions these issues.
scanf() is evil - use fgets() and then parse.
The detail is not that scanf() is completely bad.
1) The format specifiers are often used in a weak manner
char buf[100];
scanf("%s", buf); // bad - no width limit
2) The return value is errantly not checked
scanf("%99[\n]", buf); // what if use entered `"\n"`?
puts(buf);
3) When input is not as expected, it is not clear what remains in stdin.
if (scanf("%d %d %d", &i, &j, &k) != 3) {
// OK, not what is in `stdin`?
}
you can have baffling problems if you try to intermix calls to scanf with calls to getchar or getline.
Yes. Many scanf() calls leave a trailing '\n' in stdin that are then read as an empty line by getline(), fgets(). scanf() is not for reading lines. getline() and fgets() are much better suited to read a line.
Has scanf changed much since the quotes above were written?
Only so much change can happen without messing up the code base. #Jonathan Leffler
scanf() remains troublesome. scanf() is unable to accept an argument (after the format) to indicate how many characters to accept into a char * destination.
Some systems have added additional format options to help.
A fundamental issues is this:
User input is evil. It is more robust to get the text input as one step, qualify input, then parse and assess its success than trying to do all this in one function.
Security
The weakness of scanf() and coder tendency to code scanf() poorly has been a gold mine for hackers.
IMO, C lacks a robust user input function set.

What is `scanf` supposed to do with incomplete exponent-part?

Take for example rc = scanf("%f", &flt); with the input 42ex. An implementation of scanf would read 42e thinking it would encounter a digit or sign after that and realize first when reading x that it didn't get that. Should it at this point push back both x and e? Or should it only push back the x.
The reason I ask is that GNU's libc will on a subsequent call to gets return ex indicating they've pushed back both x and e, but the standard says:
An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence[245] The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
I interpret this as since 42e is a prefix of a matching input sequence (since for example 42e1 would be a matching input sequence), which should mean that it would consider 42e as a input item that should be read leaving only x unread. That would also be more convenient to implement if the stream only supports single character push back.
Your interpretation of the standard is correct. There's even an example further down in the C standard which says that 100ergs of energy shouldn't match %f%20s of %20s because 100e fails to match %f.
But most C libraries seem to implement this differently, probably due to historical reasons. I just checked the C library on macOS and it behaves like glibc. The corresponding glibc bug was closed as WONTFIX with the following explanation from Ulrich Drepper:
This is stupidity on the ISO C committee side which goes against existing
practice. Any change can break existing code.

Why does this scanf() conversion actually work?

Ah, the age old tale of a programmer incrementally writing some code that they aren't expecting to do anything more than expected, but the code unexpectedly does everything, and correctly, too.
I'm working on some C programming practice problems, and one was to redirect stdin to a text file that had some lines of code in it, then print it to the console with scanf() and printf(). I was having trouble getting the newline characters to print as well (since scanf typically eats up whitespace characters) and had typed up a jumbled mess of code involving multiple conditionals and flags when I decided to start over and ended up typing this:
(where c is a character buffer large enough to hold the entirety of the text file's contents)
scanf("%[a-zA-Z -[\n]]", c);
printf("%s", c);
And, voila, this worked perfectly. I tried to figure out why by creating variations on the character class (between the outside brackets), such as:
[\w\W -[\n]]
[\w\d -[\n]]
[. -[\n]]
[.* -[\n]]
[^\n]
but none of those worked. They all ended up reading either just one character or producing a jumbled mess of random characters. '[^\n]' doesn't work because the text file contains newline characters, so it only prints out a single line.
Since I still haven't figured it out, I'm hoping someone out there would know the answer to these two questions:
Why does "[a-zA-Z -[\nn]]" work as expected?
The text file contains letters, numbers, and symbols (':', '-', '>', maybe some others); if 'a-z' is supposed to mean "all characters from unicode 'a' to unicode 'z'", how does 'a-zA-Z' also include numbers?
It seems like the syntax for what you can enter inside the brackets is a lot like regex (which I'm familiar with from Python), but not exactly. I've read up on what can be used from trying to figure out this problem, but I haven't been able to find any info comparing whatever this syntax is to regex. So: how are they similar and different?
I know this probably isn't a good usage for scanf, but since it comes from a practice problem, real world convention has to be temporarily ignored for this usage.
Thanks!
You are picking up numbers because you have " -[" in your character set. This means all characters from space (32) to open-bracket (91), which includes numbers in ASCII (48-57).
Your other examples include this as well, but they are missing the "a-zA-Z", which lets you pick up the lower-case letters (97-122). Sequences like '\w' are treated as unknown escape sequences in the string itself, so \w just becomes a single w. . and * are taken literally. They don't have a special meaning like in a regular expression.
If you include - inside the [ (other than at the beginning or end) then the behaviour is implementation-defined.
This means that your compiler documentation must describe the behaviour, so you should consult that documentation to see what the defined behaviour is, which would explain why some of your code worked and some didn't.
If you want to write portable code then you can't use - as anything other than matching a hyphen.

scanf with precision specifier parameter

I just started learning C, and because I mixed scanf and printf, I accidentally wrote code like this:
float loan = 0.0f; // loan
float rate = 0.0f; // monthly interest rate
float mpay = 0.0f; // monthly pay
printf("Enter amount of loan: ");
scanf("%.2f",&loan); // I am supposed to write scanf("%f",&loan), I was trying to take a float input with two digits after point
printf("Enter interest rate: ");
scanf("%.2f",&rate); // similar here
printf("Enter monthly payment: ");
scanf("%.2f",&mpay); // similar here
I expected that this code would show three input prompts, each of which would hang and wait there until I type some input and hit enter. But what happened is, only the first prompt was waiting for my input, then when I hit enter, second and third prompts shows up quickly without waiting for my input, as if the code got my input from somewhere else. And later when I examined value of three variables that were supposed to hold three inputs, none of them actually stored input values.
I went back to my C book and realized that I mixed printf's minimum field and precision parameter concept with scanf's, so I guess there is just no such syntax for scanf? But if so, then why does the compiler compile, doesn't that mean there IS such syntax for scanf as well, or am I wrong about this? And how should I explain this weird behavior of not waiting for my second and third inputs, and not storing inputs into three variables?
Yes, the scanf() family as specified by the C standards doesn't support this. It's explicitly undefined behavior, which means anything is allowed to happen.
But some compilers do implement a warning that checks the validity of format strings at compile-time. GCC (and Clang) has -Wformat (enabled by -Wall which one should be using anyway). If you're using a different compiler, you have to read its documentation or use static analysis tools.
From the ISO C11 Committee Draft (N1570), 7.21.6.2 paragraph 13:
If a conversion specification is invalid, the behavior is undefined. 287)
The footnote refers to 7.31.11 par. 1:
Lowercase letters may be added to the conversion specifiers and length modifiers in fprintf and fscanf. Other characters may be used in extensions.
So . could be supported by a scanf() implementation as an extension, although I'm not aware of one that does.
run this code on TC, and it executed as you said...
First of all, you should not include the field format specifiers in scanf.
In your case I think,the numbers you provide in addition to %f (like .2 in your case)..because of that the scanf behaves abnormally..it may happen that the values you provide as input are stored at different address locations(something related to.2)..and so the variables rate,loan etc are not showing the values because nothing is stored in it.
You cannot convert the precision in a scanf. If you compile it using gcc, you should get a unknown conversion type character ‘.’ in format warning.
One way to check your program is to print out the return value of scanf and you will find all these 3 scanf give you 0, which means none of the 3 variables has been scanned any value in.

How to use sscanf correctly and safely

First of all, other questions about usage of sscanf do not answer my question because the common answer is to not use sscanf at all and use fgets or getch instead, which is impossible in my case.
The problem is my C professor wants me to use scanf in a program. It's a requirement.
However the program also must handle all the incorrect input.
The program must read an array of integers. It doesn't matter in what format the integers
for the array are supplied. To make the task easier, the program might first read the size of the array and then the integers each in a new line.
The program must handle the inputs like these (and report errors appropriately):
999999999999999...9 (numbers larger than integer)
12a3 (don't read this as an integer 12)
a...z (strings)
11 aa 22 33\n all in one line (this might be handled by discarding everything after 11)
inputs larger than the input array
There might be more incorrect cases, these are the only few I could think of.
If the erroneous input is supplied, the program must ask the user to input again until
the correct input is given, but the previous correct input must be kept (only incorrect
input must be cleared from the input stream).
Everything must conform to C99 standard.
The scanf family of function cannot be used safely, especially when dealing with integers. The first case you mentioned is particularly troublesome. The standard says this:
If this object does not have an appropriate type, or if the result of
the conversion cannot be represented in the object, the behavior is
undefined.
Plain and simple. You might think of %5d tricks and such but you'll find they're not reliable. Or maybe someone will think of errno. The scanf functions aren't required to set errno.
Follow this fun little page: they end up ditching scanf altogether.
So go back to your C professor and ask them: how exactly does C99 mandate that sscanf will report errors ?
Well, let sscanf accept all inputs as %s (i.e. strings) and then program analyze them
If you must use scanf to accept the input, I think you start with something a bit like the following.
int array[MAX];
int i, n;
scanf("%d", &n);
for (i = 0; i < n && !feof(stdin); i++) {
scanf("%d", &array[i]);
}
This will handle (more or less) the free-format input problem since scanf will automatically skip leading whitespace when matching a %d format.
The key observation for many of the rest of your concerns is that scanf tells you how many format codes it parsed successfully. So,
int matches = scanf("%d", &array[i]);
if (matches == 0) {
/* no integer in the input stream */
}
I think this handles directly concerns (3) and (4)
By itself, this doesn't quite handle the case of the input12a3. The first time through the loop, scanf would parse '12as an integer 12, leaving the remaininga3` for the next loop. You would get an error the next time round, though. Is that good enough for your professor's purposes?
For integers larger than maxint, eg, "999999.......999", I'm not sure what you can do with straight scanf.
For inputs larger than the input array, this isn't a scanf problem per se. You just need to count how many integers you've parsed so far.
If you're allowed to use sscanf to decode strings after they've been extracted from the input stream by something like scanf("%s") you could also do something like this:
while (...) {
scanf("%s", buf);
/* use strtol or sscanf if you really have to */
}
This works for any sequence of white-space separated words, and lets you separate scanning the input for words, and then seeing if those words look like numbers or not. And, if you have to, you can use scanf variants for each part.
The problem is my C professor wants me to use scanf in a program.
It's a requirement.
However the program also must handle all the incorrect input.
This is an old question, so the OP is not in that professor's class any more (and hopefully the professor is retired), but for the record, this is a fundamentally misguided and basically impossible requirement.
Experience has shown that when it comes to interactive user input, scanf is suitable only for quick-and-dirty situations when the input can be assumed to correct.
If you want to read an integer (or a floating-point number, or a simple string) quickly and easily, then scanf is a nice tool for the job. However, its ability to gracefully handle incorrect input is basically nonexistent.
If you want to read input robustly, reliably detecting incorrect input, and perhaps warning the user and asking them to try again, scanf is simply not the right tool for the job. It's like trying to drive screws with a hammer.
See this answer for some guidelines for using scanf safely in those quick-and-dirty situations. See this question for suggestions on how to do robust input using something other than scanf.
scanf("%s", string) into long int_string = strtol(string, &end_pointer, base:10)

Resources