Question about format string in scanf function - c

For each of the following pairs of scanf format strings, indicate whether or not the two strings are equivalent. If they're not, show how they can be distinguished:
(b) "%d-%d-%d" versus "%d -%d -%d"
So in this case, my answer was that they were not equivalent. Because non-white-space characters except conversion specifier which start with %, cannot be preceded by spaces, it will not match with the non-white-space character. So in the first case, no spaces will be allowed after the first and second integer, while in the second case, any number of spaces will be allowed after the first 2 integers.
But I saw that the book had a different answer. It said that they were both equivalent to each other.
Is this the mistake of the book? Or am I just wrong with the concept of format string in the scanf function?

The book is wrong. As per the specification of the scanf():
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or tab) or part of a format specifier (which begin with a % character) causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
So in first case when scanf arrives to the %d and gets the input, next is the - which means that scanf will expect next in the stream to see the non-whitespae character - and not any other whitespace character. So the legal input is 1- 2, but not 1 -2
In the second case, after first %d, scanf will allow the whitespace and than will arrive to non-whitespace, so it will allow the input 1 - 2 by the above definitions.

"%d-%d-%d" differs from "%d -%d -%d" and the difference has nothing to do with "%d".
Format "-" scans over input "-" and stops on the first space of input " -".
Format " -" scans over inputs "-" and " -" as the " " in the format matches 0 or more white-space characters in the input.
A directive composed of white-space character(s) is executed by reading input up to the first nonwhite-space character (which remains unread), or until no more characters can be read. The directive never fails. C17dr § 7.21.6.2 5
Had the question been: "%d-%d-%d" versus "%d- %d- %d",
These 2 are functionally identical.
We would need to dive input arcane stdin input errors to divine a potential difference.

Related

what do modifiers like whitespace do in scanf?

#include<stdio.h>
void main()
{
char a,b;
printf("enter a,b\n");
scanf("%c %c",&a,&b);
printf("a is %c,b is %c,a,b");
}
1.what does the whitespace in between the two format specifiers tell the computer to do?
2.do format specifiers like %d other than %c clean input buffer before they read from there?
1.what does the whitespace in between the two format specifiers tell the computer to do?
Whitespace in the format string tells scanf to read (and discard) whitespace characters up to the first non-whitespace character (which remains unread)1. So
scanf("%c %c",&a,&b);
reads a single character into a (whitespace or not), then skips over any whitespace and reads the next non-whitespace character into b.
2.do format specifiers like %d other than %c clean input buffer before they read from there?
Not sure quite what you mean here - d will skip over any leading whitespace and start reading from the first non-whitespace character, c will read the next character whether it's whitespace or not. Neither will flush the input stream, nor will they write to the target variable if the directive fails (for example, if the next non-whitespace character in the input stream isn't a digit, the d directive fails, and the argument corresponding to that directive will not be updated).
N1570, §7.21.6.2, para 5:
"A directive composed of white-space character(s) is executed by reading input up to the
first non-white-space character (which remains unread), or until no more characters can
be read. The directive never fails."
Wikipedia says
whitespace: Any whitespace characters trigger a scan for zero or more
whitespace characters. The number and type of whitespace characters do
not need to match in either direction.
"%d" will skip whitespace until it finds an integer.
"%c" reads a single character (and space is a character, so it doesn't skip).

Why doesn't scanf follow the docs with non-whitespace chars?

The documentation on scanf says that any "Non-whitespace character" in the format, causes the function to read the next character from the stream, compare it to this non-whitespace character and if it matches, it is discarded and the function continues with the next character of format. If the character does not match, the function fails, returning and leaving subsequent characters of the stream unread.
However, if I run:
int x;
while(scanf("\n%d",&x)==1) printf("%d\n",x);
with the following input:
1 2
It prints:
1
2
Given that there's no '\n' preceding any of the two numbers, why does scanf read them? Isn't that against the docs?
At the same page that you link to and just before the paragraph you quoted, I see:
Whitespace character: the function will read and ignore any whitespace characters encountered before the next non-whitespace character (whitespace characters include spaces, newline and tab characters -- see isspace). A single whitespace in the format string validates any quantity of whitespace characters extracted from the stream (including none).
A \n is a whitespace character.
Hence, the call
scanf("\n%d",&x)
will extract and discard any number of whitespace characters from stdio before reading data into &x.
\n is a whitespace character. See isspace()

What is the difference between these two scanf statements?

I am having some doubt. The doubt is
What is the difference between the following two scanf statements.
scanf("%s",buf);
scanf("%[^\n]", buf);
If I am giving the second scanf in the while loop, it is going infinitely. Because the \n is in the stdin.
But in the first statement, reads up to before the \n. It also will not read the \n.
But The first statement does not go in infinitely. Why?
Regarding the properties of the %s format specifier, quoting C11 standrad, chapter §7.21.6.2, fscanf()
s Matches a sequence of non-white-space characters.
The newline is a whitespace character, so only a newlinew won't be a match for %s.
So, in case the newline is left in the buffer, it does not scan the newline alone, and wait for the next non-whitespace input to appear on stdin.
The %s format specifier specifies that scanf() should read all characters in the standard input buffer stdin until it encounters the first whitespace character, and then stop there. The whitespace ('\n') remains in the stdin buffer until consumed by another function, like getchar().
In the second case there is no mention of stopping.
You can think of scanf as extracting words separated by whitespace from a stream of characters. Imagine reading a file which contains a table of numbers, for example, without worrying about the exact number count per line or the exact space count and nature between numbers.
Whitespace, for the record, is horizontal and vertical (these exist) tabs, carriage returns, newlines, form feeds and last not least, actual spaces.
In order to free the user from details, scanf treats all whitespace the same: It normally skips it until it hits a non-whitespace and then tries to convert the character sequence starting there according to the specified input conversion. E.g. with "%d" it expects a sequence of digits, perhaps preceded by a minus sign.
The input conversion "%s" also starts with skipping whitespace (and that's clearer documented in the opengroup's man page than in the Linux one).
After skipping leading whitespace, "%s" accepts everything until another whitespace is read (and put back in the input, because it isn't made part of the "word" being read). That sequence of non-whitespace chars -- basically a "word" -- is stored in the buffer provided. For example, scanning a string from " a bc " results in skipping 3 spaces and storing "a" in the buffer. (The next scanf would skip the intervening space and put "bc" in the buffer. The next scanf after that would skip the remaining whitespace, encounter the end of file and return EOF.) So if a user is asked to enter three words they could give three words on one line or on three lines or on any number of lines preceded or separated by any number of empty lines, i.e. any number of subsequent newlines. Scanf couldn't care less.
There are a few exceptions to the "skip leading whitespace" strategy. Both concern conversions which usually indicate that the user wants to have more control about the input conversion. One of them is "%c" which just reads the next character. The other one is the "%[" spec which details exactly which characters are considered part of the next "word" to read. The conversion specification you use, "%[^\n]", reads everything except newline. Input from the keyboard is normally passed to a program line by line, and each line is by definition terminated by a newline. The newline of the first line passed to your program will be the first character from the input stream which does not match the conversion specification. Scanf will read it, inspect it and then put it back in the input stream (with ungetc()) for somebody else to consume. Unfortunately, it will itself be the next consumer, in another loop iteration (as I assume). Now the very first character it encounters (the newline) does not match the input conversion (which demands anything but the newline). Scanf therefore gives up immediately, puts the offending character dutifully back in the input for somebody else to consume and returns 0 indicating the failure to even perfom the very first conversion in the format string. But alas, it itself will be the next consumer. Yes, machines are stupid.
First scanf("%s",buf); scan only word or string, but second scanf("%[^\n]", buf); reads a string until a user inputs is new line character.
Let's take a look at these two code snippets :
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%s", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello
#include <stdio.h>
int main(void){
char sentence[20] = {'\0'};
scanf("%[^\n]", sentence);
printf("\n%s\n", sentence);
return 0;
}
Input : Hello, my name is Claudio.
Output : Hello, my name is Claudio.
%[^\n] is an inverted group scan and this is how I personally use it, as it allows me to input a sentece with blank spaces in it.
Common
Both expect buf to be a pointer to a character array. Both append a null character to that array if at least 1 character was saved. Both return 1 if something was saved. Both return EOF if end-of-file detected before saving anything. Both return EOF in input error is detected. Both may save buf with embedded '\0' characters in it.
scanf("%s",buf);
scanf("%[^\n]", buf);
Differences
"%s" 1) consumes and discards leading white-space including '\n', space, tab, etc. 2) then saves non-white-space to buf until 3) a white-space is detected (which is then put back into stdin). buf will not contain any white-space.
"%[^\n]" 1) does not consume and discards leading white-space. 2) it saves non-'\n' characters to buf until 3) a '\n' is detected (which is then put back into stdin). If the first character read is a '\n', then nothing is saved in buf and 0 is returned. The '\n' remains in stdin and explains OP's infinite loop.
Failure to test the return value of scanf() is a common code oversight. Better code checks the return value of scanf().
IMO: code should never use either:
Both fail to limit the number of characters read. Use fgets().
You can think of %s as %[^\n \t\f\r\v], that is, after skipping any leading whitespace, a group a non-whitespace characters.

Using scanf in for loop

Here is my c code:
int main()
{
int a;
for (int i = 0; i < 3; i++)
scanf("%d ", &a);
return 0;
}
When I input things like 1 2 3, it will ask me to input more, and I need to input something not ' '.
However, when I change it to (or other thing not ' ')
scanf("%d !", &a);
and input 1 ! 2! 3!, it will not ask more input.
The final space in scanf("%d ", &a); instructs scanf to consume all white space following the number. It will keep reading from stdin until you type something that is not white space. Simplify the format this way:
scanf("%d", &a);
scanf will still ignore white space before the numbers.
Conversely, the format "%d !" consumes any white space following the number and a single !. It stops scanning when it gets this character, or another non space character which it leaves in the input stream. You cannot tell from the return value whether it matched the ! or not.
scanf is very clunky, it is very difficult to use it correctly. It is often better to read a line of input with fgets() and parse that with sscanf() or even simpler functions such as strtol(), strspn() or strcspn().
scanf("%d", &a);
This should do the job.
Basically, scanf() consumes stdin input as much as its pattern matches. If you pass "%d" as the pattern, it will stop reading input after a integer is found. However, if you feed it with "%dx" for example, it matches with all integers followed by a character 'x'.
More Details:
Your pattern string could have the following characters:
Whitespace character: the function will read and ignore any whitespace
characters encountered before the next non-whitespace character
(whitespace characters include spaces, newline and tab characters --
see isspace). A single whitespace in the format string validates any
quantity of whitespace characters extracted from the stream (including
none).
Non-whitespace character, except format specifier (%): Any character that is not either a whitespace character (blank, newline or
tab) or part of a format specifier (which begin with a % character)
causes the function to read the next character from the stream,
compare it to this non-whitespace character and if it matches, it is
discarded and the function continues with the next character of
format. If the character does not match, the function fails, returning
and leaving subsequent characters of the stream unread.
Format specifiers: A sequence formed by an initial percentage sign (%) indicates a format specifier, which is used to specify the type
and format of the data to be retrieved from the stream and stored into
the locations pointed by the additional arguments.
Source: http://www.cplusplus.com/reference/cstdio/scanf/

The scanf function, the specifer %s and the new line

I read into C11 standard this:
Input white-space characters (as specified by the isspace function) are
skipped, unless the specification includes a [, c, or n specifier.
so I understand that if I use that specifiers the next scanf can contains for example a new line.
But if I write this:
char buff[5 + 1];
printf("Input: ");
scanf("%10s", buff);
printf("Input: ");
char buff_2[5 + 1];
scanf("%[abcde]", buff_2);
and then I input, i.e., RR and then Return,
the next scanf fails because of \n.
So also %s doesn't discard a new line?
So also %s doesn't discard a new line?
%s tells scanf to discard any leading whitespace, including newlines. It will then read any non-whitespace characters, leaving any trailing whitespace in the input buffer.
So assuming your input stream looks like "\n\ntest\n", scanf("%s", buf) will discard the two leading newlines, consume the string "test", and leave the trailing newline in the input stream, so after the call the input stream looks like "\n".
Edit
Responding to xdevel2000's comment here.
Let's talk about how conversion specifiers work. Here are some relevant paragraphs from the online C 2011 standard:
7.21.6.2 The fscanf function
...
9 An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.285)
The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the
count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object does not have an appropriate type, or if the result of the conversion cannot be represented in the object, the behavior is undefined.
12 The conversion specifiers and their meanings are:
...
c Matches a sequence of characters of exactly the number specified by the field width (1 if no field width is present in the directive).286)
...
s Matches a sequence of non-white-space characters.286)
...
[ Matches a nonempty sequence of characters from a set of expected characters
(the scanset).286)
...
285) fscanf pushes back at most one input character onto the input stream. Therefore, some sequences that are acceptable to strtod, strtol, etc., are unacceptable to fscanf.
286) No special provisions are made for multibyte characters in the matching rules used by the c, s, and [
conversion specifiers — the extent of the input field is determined on a byte-by-byte basis. The
resulting field is nevertheless a sequence of multibyte characters that begins in the initial shift state.
%s matches a sequence of non-whitespace characters. Here's a basic algorithm describing how it works (not taking into account end of file or other exceptional conditions):
c <- next character from input stream
while c is whitespace
c <- next character from input stream
while c is not whitespace
append c to target buffer
c <- next character from input stream
push c back onto input stream
append 0 terminator to target buffer
The first whitespace character after the non-whitespace characters (if any) is pushed back onto the input stream for the next input operation to read.
By contrast, the algorithm for the %c specifier is dead simple (unless you're using a field width greater than 1, which I've never done and won't get into here):
c <- next character from input stream
write c to target
The algorithm for the %[ conversion specifier is a little different:
c <- next character from input stream
while c is in the list of characters in the scan set
append c to target buffer
c <- next character from input stream
append 0 to target buffer
push c back onto input stream
So, it's a mistake to describe any conversion specifier as "retaining" trailing whitespace (which would imply that the trailing whitespace is saved to the target buffer); that's not the case. Trailing whitespace is left in the input stream for the next input operation to read.
%s consumes everything until a whitespace character and discards leading whitespace characters not trailing ones. The [ conversion specifier in the second scanf does not skip leading whitespace characters and therefore, fails to scan because of the newline character(which is a whitespace character) left over by the first scanf.
To fix the issue, either use
int c;
while((c=getchar())!='\n' && c!=EOF);
After the first scanf to clear the stdin or add a space before the format specifier(%[) in the second scanf.
Your excerpt from the standard omits important context. The preceding text specifies that skipping whitespace is the first step in processing a conversion specifier for a type other than c, [, or n.
The next step, other than for an n specifier, is to read an input item, which is defined as "the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence" (quoted from C99, but equivalent applies to C2011).
An s item "[m]atches a sequence of non-white-space characters", so with the input you specify, the first scanf() reads everything up to, but not including, the newline.
The standard explicitly specifies
Trailing white space (including new-line characters) is left unread unless matched by a directive.
so the newline definitely remains unscanned at this point.
The format given to the next scanf() starts with a %[ conversion specifier, which, as you already observed, does not cause whitespace (leading or otherwise) to be skipped, though it can include whitespace in the item that is scanned. Since the next character available from the input is a newline, however, and the given scan set for your %[ does not include that character, zero characters are scanned for that item. Going back to the standard (C99, again):
If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
There are easier ways to read free-form input line by line, but you can do it with scanf() if you must. For example:
char buff[10 + 1] = {0};
printf("Input: ");
/*
* Ignore leading whitespace and scan a string of up to 10 non-whitespace
* characters. Zero-length inputs will produce a matching failure, leaving
* the buffer unchanged (and initialized to an empty string). End of
* input will produce an input error, which is ignored.
*/
scanf("%10s", buff);
/* Scan and ignore anything else up to a newline. There will
* be an (ignorable) matching failure if the next available character is a
* newline. Any input error generated by this call is also ignored.
*/
scanf("%*[^\n]");
/*
* Consume the next character, if any. If there is one, it will be a
* newline. An input error will occur if we're already at the end of stdin;
* a careful program would test for that (by comparing the return value to
* EOF) but this one doesn't.
*/
scanf("%*c");
printf("Input: ");
/* scan the second string; again, we're ignoring matching and input errors */
char buff_2[5 + 1] = {0};
scanf("%5[abcde]", buff_2);
If you're exclusively using scanf() for such a job then it is essential to read each line in three steps, as shown, because each one can produce a matching failure that would prevent any attempt to match subsequent items.
Note, too, how maximum field widths are matched to buffer sizes in that example, which your original code did not do correctly.

Resources