Why doesn't scanf follow the docs with non-whitespace chars? - c

The documentation on scanf says that any "Non-whitespace character" in the format, causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
However, if I run:
int x;
while(scanf("\n%d",&x)==1) printf("%d\n",x);
with the following input:
1 2
It prints:
1
2
Given that there's no '\n' preceding any of the two numbers, why does scanf read them? Isn't that against the docs?

At the same page that you link to and just before the paragraph you quoted, I see:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
A \n is a whitespace character.
Hence, the call
scanf("\n%d",&x)
will extract and discard any number of whitespace characters from stdio before reading data into &x.

\n is a whitespace character. See isspace()

Related

Question about format string in scanf function

For each of the following pairs of scanf format strings, indicate whether or not the two strings are equivalent. If they're not, show how they can be distinguished:
(b) "%d-%d-%d" versus "%d -%d -%d"
So in this case, my answer was that they were not equivalent. Because non-white-space characters except conversion specifier which start with %, cannot be preceded by spaces, it will not match with the non-white-space character. So in the first case, no spaces will be allowed after the first and second integer, while in the second case, any number of spaces will be allowed after the first 2 integers.
But I saw that the book had a different answer. It said that they were both equivalent to each other.
Is this the mistake of the book? Or am I just wrong with the concept of format string in the scanf function?
The book is wrong. As per the specification of the scanf():
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
So in first case when scanf arrives to the %d and gets the input, next is the - which means that scanf will expect next in the stream to see the non-whitespae character - and not any other whitespace character. So the legal input is 1- 2, but not 1 -2
In the second case, after first %d, scanf will allow the whitespace and than will arrive to non-whitespace, so it will allow the input 1 - 2 by the above definitions.
"%d-%d-%d" differs from "%d -%d -%d" and the difference has nothing to do with "%d".
Format "-" scans over input "-" and stops on the first space of input " -".
Format " -" scans over inputs "-" and " -" as the " " in the format matches 0 or more white-space characters in the input.
A directive composed of white-space character(s) is executed by reading input up to the first nonwhite-space character (which remains unread), or until no more characters can be read. The directive never fails. C17dr § 7.21.6.2 5
Had the question been: "%d-%d-%d" versus "%d- %d- %d",
These 2 are functionally identical.
We would need to dive input arcane stdin input errors to divine a potential difference.

what do modifiers like whitespace do in scanf?

#include<stdio.h>
void main()
{
char a,b;
printf("enter a,b\n");
scanf("%c %c",&a,&b);
printf("a is %c,b is %c,a,b");
}
1.what does the whitespace in between the two format specifiers tell the computer to do?
2.do format specifiers like %d other than %c clean input buffer before they read from there?
1.what does the whitespace in between the two format specifiers tell the computer to do?
Whitespace in the format string tells scanf to read (and discard) whitespace characters up to the first non-whitespace character (which remains unread)1. So
scanf("%c %c",&a,&b);
reads a single character into a (whitespace or not), then skips over any whitespace and reads the next non-whitespace character into b.
2.do format specifiers like %d other than %c clean input buffer before they read from there?
Not sure quite what you mean here - d will skip over any leading whitespace and start reading from the first non-whitespace character, c will read the next character whether it's whitespace or not. Neither will flush the input stream, nor will they write to the target variable if the directive fails (for example, if the next non-whitespace character in the input stream isn't a digit, the d directive fails, and the argument corresponding to that directive will not be updated).
N1570, §7.21.6.2, para 5:
"A directive composed of white-space character(s) is executed by reading input up to the
first non-white-space character (which remains unread), or until no more characters can
be read. The directive never fails."
Wikipedia says
whitespace: Any whitespace characters trigger a scan for zero or more
whitespace characters. The number and type of whitespace characters do
not need to match in either direction.
"%d" will skip whitespace until it finds an integer.
"%c" reads a single character (and space is a character, so it doesn't skip).

What is the difference between these two scanf statements?

I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.

Using scanf in for loop

Here is my c code:
int main()
{
int a;
for (int i = 0; i < 3; i++)
scanf("%d ", &a);
return 0;
}
When I input things like 1 2 3, it will ask me to input more, and I need to input something not ' '.
However, when I change it to (or other thing not ' ')
scanf("%d !", &a);
and input 1 ! 2! 3!, it will not ask more input.
The final space in scanf("%d ", &a); instructs scanf to consume all white space following the number. It will keep reading from stdin until you type something that is not white space. Simplify the format this way:
scanf("%d", &a);
scanf will still ignore white space before the numbers.
Conversely, the format "%d !" consumes any white space following the number and a single !. It stops scanning when it gets this character, or another non space character which it leaves in the input stream. You cannot tell from the return value whether it matched the ! or not.
scanf is very clunky, it is very difficult to use it correctly. It is often better to read a line of input with fgets() and parse that with sscanf() or even simpler functions such as strtol(), strspn() or strcspn().
scanf("%d", &a);
This should do the job.
Basically, scanf() consumes stdin input as much as its pattern matches. If you pass "%d" as the pattern, it will stop reading input after a integer is found. However, if you feed it with "%dx" for example, it matches with all integers followed by a character 'x'.
More Details:
Your pattern string could have the following characters:
Whitespace character: the function will read and ignore any whitespace
characters encountered before the next non-whitespace character
(whitespace characters include spaces, newline and tab characters --
see isspace). A single whitespace in the format string validates any
quantity of whitespace characters extracted from the stream (including
none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or
tab) or part of a format specifier (which begin with a % character)
causes the function to read the next character from the stream,
compare it to this non-whitespace character and if it matches, it is
discarded and the function continues with the next character of
format. If the character does not match, the function fails, returning
and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type
and format of the data to be retrieved from the stream and stored into
the locations pointed by the additional arguments.
Source: http://www.cplusplus.com/reference/cstdio/scanf/

Reading tabs in C

I am trying to sscanf from a file. The pattern I am trying to match is the following
"%s\t%s\t%s\t%f"
Thing is that I am surprised because for an input like following:
Hello Hola Hallo 5.344434
it is reading all of the data properly...
Do you know why?
I was expecting it to be finding tabs like |---|---|---|---| not that only one space was matching.
Thanks
The standard reads:
A directive composed of white-space character(s) is executed by
reading input up to the first non-white-space character (which remains
unread), or until no more characters can be read.
In other words, a sequence of white-space characters (space, tab, newline, etc.; as defined by isspace()) in the format string matches any amount of white space in the input.
No way - scanf treat all white-space identically - they're used as delimiter, and just ignored. So if you really want to doing something with tab space, you should parse it yourself.
To parse, you need to read the whole line without any parsing, unlike scanf. So, you need to use fgets.
FILE *fp = /* init.. */;
char buf[1024];
fgets(buf, 1024, fp);
// parse yourself!
If you take a look at the documentation for scanf:
C string that contains a sequence of characters that control how characters extracted from the stream are treated:
Whitespace character: the function will read and ignore any whitespace characters
encountered before the next non-whitespace character (whitespace characters include
spaces, newline and tab characters -- see isspace). A single whitespace in the format
string validates any quantity of whitespace characters extracted from the stream
(including none).
Non-whitespace character, except format specifier (%): Any character that is not
either a whitespace character (blank, newline or tab) or part of a format specifier
(which begin with a % character) causes the function to read the next character
from the stream, compare it to this non-whitespace character and if it matches,
it is discarded and the function continues with the next character of format. If the
character does not match, the function fails, returning and leaving subsequent
characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a
format specifier, which is used to specify the type and format of the data to be
retrieved from the stream and stored into the locations pointed by the additional
arguments.
You will notice that the whitespace characters get ignored.
Did you carefully read scanf(3) documentation? You need to read the entire line using getline(3) then parse that line "manually"!

Resources