#include<stdio.h>
void main()
{
char a,b;
printf("enter a,b\n");
scanf("%c %c",&a,&b);
printf("a is %c,b is %c,a,b");
}
1.what does the whitespace in between the two format specifiers tell the computer to do?
2.do format specifiers like %d other than %c clean input buffer before they read from there?
1.what does the whitespace in between the two format specifiers tell the computer to do?
Whitespace in the format string tells scanf to read (and discard) whitespace characters up to the first non-whitespace character (which remains unread)1. So
scanf("%c %c",&a,&b);
reads a single character into a (whitespace or not), then skips over any whitespace and reads the next non-whitespace character into b.
2.do format specifiers like %d other than %c clean input buffer before they read from there?
Not sure quite what you mean here - d will skip over any leading whitespace and start reading from the first non-whitespace character, c will read the next character whether it's whitespace or not. Neither will flush the input stream, nor will they write to the target variable if the directive fails (for example, if the next non-whitespace character in the input stream isn't a digit, the d directive fails, and the argument corresponding to that directive will not be updated).
N1570, §7.21.6.2, para 5:
"A directive composed of white-space character(s) is executed by reading input up to the
first non-white-space character (which remains unread), or until no more characters can
be read. The directive never fails."
Wikipedia says
whitespace: Any whitespace characters trigger a scan for zero or more
whitespace characters. The number and type of whitespace characters do
not need to match in either direction.
"%d" will skip whitespace until it finds an integer.
"%c" reads a single character (and space is a character, so it doesn't skip).
Related
For each of the following pairs of scanf format strings, indicate whether or not the two strings are equivalent. If they're not, show how they can be distinguished:
(b) "%d-%d-%d" versus "%d -%d -%d"
So in this case, my answer was that they were not equivalent. Because non-white-space characters except conversion specifier which start with %, cannot be preceded by spaces, it will not match with the non-white-space character. So in the first case, no spaces will be allowed after the first and second integer, while in the second case, any number of spaces will be allowed after the first 2 integers.
But I saw that the book had a different answer. It said that they were both equivalent to each other.
Is this the mistake of the book? Or am I just wrong with the concept of format string in the scanf function?
The book is wrong. As per the specification of the scanf():
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
So in first case when scanf arrives to the %d and gets the input, next is the - which means that scanf will expect next in the stream to see the non-whitespae character - and not any other whitespace character. So the legal input is 1- 2, but not 1 -2
In the second case, after first %d, scanf will allow the whitespace and than will arrive to non-whitespace, so it will allow the input 1 - 2 by the above definitions.
"%d-%d-%d" differs from "%d -%d -%d" and the difference has nothing to do with "%d".
Format "-" scans over input "-" and stops on the first space of input " -".
Format " -" scans over inputs "-" and " -" as the " " in the format matches 0 or more white-space characters in the input.
A directive composed of white-space character(s) is executed by reading input up to the first nonwhite-space character (which remains unread), or until no more characters can be read. The directive never fails. C17dr § 7.21.6.2 5
Had the question been: "%d-%d-%d" versus "%d- %d- %d",
These 2 are functionally identical.
We would need to dive input arcane stdin input errors to divine a potential difference.
Why some writers use %[^\n]s specifier instead of %[^\n]? Which one is correct?
If you want to read in a sequence of non-newline characters, %[^\n] is correct.
The %[ format specifier to scanf will accept any following characters until a ] is encountered. If the first given character is ^, then it accepts characters not in this list. The characters that are read are then placed in the given char * parameter.
%[^\n]s is the above format specifier followed by a literal 's'. The s is not part of the %[ format specifier. So this will read characters until it encounters a newline and put those characters in the given char *, then it will attempt to read an s character which it doesn't find because a newline is next.
The below is what I understood so far. Please confirm, add, correct as the case may be -
scanf (" %c %d %s", &a, &b, c);
In the above - the first space before %c makes sure the buffer is cleared before scanf starts accepting new string into it from stdin for this particular function scanf call. This clears any of the delimiters from any previous input function calls..
The remaining two spaces before %d and %s allow any number of spaces or tabs but not enter key press between the user's entry of a, b and b,c , respectively.
Even with the above, none of the inputs can contain a space in it i.e. space is the delimiter for each of the three inputs. To add space it has to be specified in the string control braces [] like "%[a-z A-Z_0-9]" can contain any upper or lower case alphabets, digits 0-9, a space and an underscore - but will treat all other characters as invalid - The invalid character will go to the next input in the format string, if any, so if %[___] above was followed by %c and an astrick is pressed, the astrick is put into the character corresponding to %c.
Please confirm, correct, add. Thanks.
From cppreference.com:
Any single whitespace character in the format string consumes all available consecutive whitespace characters from the input
So all the spaces in the format string just mean to skip over any whitespace in the input.
There's no difference between the spaces before %c and the other spaces. The initial spaces don't clear the buffer, it just skips over any initial whitespace in the input. This ensures that %c will read the first non-whitespace character in the input.
Whitespace includes space, TAB, and newline characters. So you can put spaces or press enter between each input.
You don't actually need the spaces before %d or %s. These formats don't read anything that contains whitespace, and they automatically skip over any whitespace before the object they read. The spaces in the format string are redundant and do no harm, and may make it easier to read.
All conversion specifiers other than [, c, and n consume and discard all leading whitespace characters (determined as if by calling isspace) before attempting to parse the input.
...
The conversion specifiers that do not consume leading whitespace, such as %c, can be made to do so by using a whitespace character in the format string
the first space before %c makes sure the buffer is cleared before scanf starts accepting new string into it from stdin for this particular function scanf call. This clears any of the delimiters from any previous input function calls.
No. The buffer is not cleared and there are no delimiters.
The remaining two spaces before %d and %s allow any number of spaces or tabs but not enter key press between the user's entry of a, b and b,c , respectively.
No. The enter key produces a newline character, which is whitespace.
Even with the above, none of the inputs can contain a space in it i.e. space is the delimiter for each of the three inputs.
No.
The invalid character will go to the next input in the format string
Yes. This isn't limited to %[ ]: If e.g. %d sees 12foo in the input stream, it will consume 12 and leave foo to be read by the rest of the format string (however, if there are no leading digits at all, %d will fail and abort processing).
Any whitespace character in the format string reads and consumes all available whitespace characters at this point in the input stream, including spaces, tabs, and newlines. It doesn't matter whether the space appears before %c or %d or %s: All whitespace (including newlines) in the input is skipped.
%c accepts spaces just fine. Space is not a delimiter because %c has no delimiters; it always reads a single character. The only reason it can't read a space in your code is that it is preceded by in the format string, which will have skipped over all available whitespace.
As for %d and %s, they implicitly skip leading whitespace. That is, " %d" is equivalent to "%d" and " %s" is equivalent to "%s".
The documentation on scanf says that any "Non-whitespace character" in the format, causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
However, if I run:
int x;
while(scanf("\n%d",&x)==1) printf("%d\n",x);
with the following input:
1 2
It prints:
1
2
Given that there's no '\n' preceding any of the two numbers, why does scanf read them? Isn't that against the docs?
At the same page that you link to and just before the paragraph you quoted, I see:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
A \n is a whitespace character.
Hence, the call
scanf("\n%d",&x)
will extract and discard any number of whitespace characters from stdio before reading data into &x.
\n is a whitespace character. See isspace()
Here is my c code:
int main()
{
int a;
for (int i = 0; i < 3; i++)
scanf("%d ", &a);
return 0;
}
When I input things like 1 2 3, it will ask me to input more, and I need to input something not ' '.
However, when I change it to (or other thing not ' ')
scanf("%d !", &a);
and input 1 ! 2! 3!, it will not ask more input.
The final space in scanf("%d ", &a); instructs scanf to consume all white space following the number. It will keep reading from stdin until you type something that is not white space. Simplify the format this way:
scanf("%d", &a);
scanf will still ignore white space before the numbers.
Conversely, the format "%d !" consumes any white space following the number and a single !. It stops scanning when it gets this character, or another non space character which it leaves in the input stream. You cannot tell from the return value whether it matched the ! or not.
scanf is very clunky, it is very difficult to use it correctly. It is often better to read a line of input with fgets() and parse that with sscanf() or even simpler functions such as strtol(), strspn() or strcspn().
scanf("%d", &a);
This should do the job.
Basically, scanf() consumes stdin input as much as its pattern matches. If you pass "%d" as the pattern, it will stop reading input after a integer is found. However, if you feed it with "%dx" for example, it matches with all integers followed by a character 'x'.
More Details:
Your pattern string could have the following characters:
Whitespace character: the function will read and ignore any whitespace
characters encountered before the next non-whitespace character
(whitespace characters include spaces, newline and tab characters --
see isspace). A single whitespace in the format string validates any
quantity of whitespace characters extracted from the stream (including
none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or
tab) or part of a format specifier (which begin with a % character)
causes the function to read the next character from the stream,
compare it to this non-whitespace character and if it matches, it is
discarded and the function continues with the next character of
format. If the character does not match, the function fails, returning
and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type
and format of the data to be retrieved from the stream and stored into
the locations pointed by the additional arguments.
Source: http://www.cplusplus.com/reference/cstdio/scanf/