Logical inconsistency with [ ] conversion specifier in scanf() in C - c

Please have a look at this code snippet:
char line1[10], line2[10];
int rtn;
rtn = scanf("%9[a]%9[^\n]", line1, line2);
printf("line1 = %s|\nline2 = %s|\n", line1, line2);
printf("rtn = %d\n", rtn);
Output:
$ gcc line.c -o line
$ ./line
abook
line1 = a|
line2 = book|
rtn = 2
$./line
book
line1 = |
line2 = �Js�|
rtn = 0
$
For input abook, %9[a] fails at b from the book and stores previously parsed a+\0 at line1.
Then %9[^\n] parses the remaining line and stores just now parsed book+\0 at line2.
Please note 2 points here:
At the time of storing the parsed input, \0 is appended at the end of it since %[] is a conversion specifier for a string.
When %9[a] failed at b, scanf didn't exit. It simply went on scanning further input.
Now for input book, %9[a] should fail at b from the book and should store just \0 at line1 since here nothing was parsed.
Then %9[^\n] should parse the remaining line and should store just now parsed book+\0 at line2.
Now, let's see what exactly happened:
Here return value is 0 that means scanf didn't assign value to any variable. scanf simply exited without assigning any values. So garbage data at line2. And in the case of line1 that garbage data happen to be a NULL character.
But this is quite strange! Isn't it?
I mean scanf exits if %[...] fails at the very first character of input. (Even if additional conversion specifier is there in scanf statement.)
But if the same %[...] fails at any other character other than first one then scanf simply continues scanning the further input. (If additional conversion specifier is there of course.) It doesn't exit.
So why this inconsistency?
Why not let scanf statement continue scan the input (if additional conversion specifier is there of course) even if %[...] fails at the very first char of input? Exactly like what happens in other case.
Is there any special reason behind this inconsistency?
$ gcc --version
gcc (Ubuntu 4.4.3-4ubuntu5.1) 4.4.3

2) When %9[a] failed at b, scanf didn't exit. It simply went on scanning further input.
Yes, the %9[a] directive means "store up to 9 'a's, but at least one"(1), so the conversion %9[a] did not fail, it succeeded. It found fewer 'a's than it could have consumed, but that's not a failure. The input matching failed at the 'b', but the conversion succeeded.
(1) Specified in 7.21.6.2 (12) where the conversions are described:
[ Matches a nonempty sequence of characters from a set of expected characters (the scanset).
Now for input book, %9[a] should fail at b from the book and should store just '\0' at line1 since here nothing was parsed. Then %9[^\n] should parse the remaining line and should store just now parsed book+\0 at line2.
No. It is supposed to exit when a conversion fails. The first conversion %9[a] failed, so scanf is supposed to stop and return 0, since no conversion succeeded.
Always check the return value of scanf.
That is specified (for fscanf, but scanf is equivalent to fscanf with stdin as input stream) in 7.21.6.2 (16):
The fscanf function returns the value of the macro EOF if an input failure occurs
before the first conversion (if any) has completed. Otherwise, the function returns the
number of input items assigned, which can be fewer than provided for, or even zero, in
the event of an early matching failure.
Here output for line1 is nothing which is exactly what we expected. An empty string!
You can't expect anything. The arrays line1 and line2 aren't initialised, so when the conversion fails, their contents is still indeterminate. In this case, line1 contained no printable character before the first 0 byte.
But for line2 it's garbage chars! We didn't expect this. So how did this happen ?
That's what happened to be the contents of line2. There were never any values assigned to the elements, so they are whatever they happened to be before the call to scanf.

Transferred from comments to the question since the response to the reply question requires more space than the comments allow.
This comment refers to an earlier version of the code:
Since you didn't check the return value from scanf(), you've no idea whether it said "I failed" or not. You can't blame it when you ignore its error returns; in the second example, it will have said '0 items scanned successfully', which means that none of the variables were set to anything useful at all. You must always check the return value from scanf() so you know whether it did what you expected.
The reply question is:
I updated the code and output to show the return value of scanf. And yes for case 2 the return value is 0. But this doesn't answer the question. Clearly scanf exited in case 2. But for case 1, return value is 2 which means scanf successfully assigned values to both the variables. So why this inconsistency?
I don't see any inconsistency. The fscanf() specification (copied from ISO/IEC 9899:2011, but the URL links to POSIX rather than the C standard) says:
¶3 [...] Each conversion specification is introduced by the character %.
After the %, the following appear in sequence:
— An optional assignment-suppressing character *.
— An optional decimal integer greater than zero that specifies the maximum field width
(in characters).
— An optional length modifier that specifies the size of the receiving object.
— A conversion specifier character that specifies the type of conversion to be applied.
Later, it says:
¶8 [...] Input white-space characters (as specified by the isspace function) are skipped, unless
the specification includes a [, c, or n specifier.284)
¶9 An input item is read from the stream, unless the specification includes an n specifier. An
input item is defined as the longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a matching input sequence.285)
The first character, if any, after the input item remains unread. If the length of the input
item is zero, the execution of the directive fails; this condition is a matching failure unless
end-of-file, an encoding error, or a read error prevented input from the stream, in which
case it is an input failure.
¶12 [...]
[ Matches a nonempty sequence of characters from a set of expected characters
(the scanset).286)
[Bold italic emphasis added. I've left the footnote references in place, but the contents of the footnotes are not material to the discussion so I've omitted them.]
So, the behaviour you are seeing is exactly what the standard demands. When %9[a] is applied to the string abook, there is a sequence of one a which matches the %9[a] conversion specification, so the directive is successful, and the scan continues with book. When %9[a] is applied to the string book, there are zero characters matching the item, so the execution of the directive fails and it is a matching error and since it is the first conversion specification, the return value of 0 is correct.
Note that the length specifies a maximum field width, so the 9 in %9[a] means 1-9 letters a.

Related

Why there should not be type specifier like s or c after [0-9A-Z^%]?

For example consider the following code -
fscanf(fp,"%d:%d:%[^:]:%[^\n]\n",&pow->no,&pow->seen,pow->word,pow->means);
printf("\ntthis is what i read--\n%d:%d:%s:%s:\n",pow->no,pow->seen,pow->word,pow->means);
here pow is pointer to an object declared before,
when I put s as in fscanf(fp,"%d:%d:%[^:]s:%[^\n]\n" the 3rd one is read but not the last one
output is --
4:0:Abridge::
but when i do fscanf(fp,"%d:%d:%[^:]:%[^\n]s\n" all are read
output is --
4:0:Abridge:To condense:
AND without s anywhere fscanf(fp,"%d:%d:%[^:]:%[^\n]\n" all are read
output is --
`4:0:Abridge:To condense:
WHY??
To answer your question what is the meaning of %[^\n]s there are two format specifier one is [] and another is s.
Now the first one will scan anything other than \n and then it gets a \n and keeps it in stdin. And move on. But it doesn't stop here - it basically then tries to find a match for the letter s. In case it doesn't find it - it fails. (The explanation with %[^:]s will be same as this one).
Now decide if this is what you really want.[^\n] is the right one which will scan until \n is found (and yes it doesn't skip whitespace like %s do). scanset covers the letter including s also. And more than that %[^\n]s is self contradictory. So no use of it either.
%d:%d:%[^:]s:%[^\n]
%d - Matches an optionally signed decimal integer. (Ignore whitespace)
: - Then looks for ':'
%d - Matches an optionally signed decimal integer. (Ignore whitespace)
: - Then looks for ':'
%[^:] - No white space ignored - everything is taken into input except `:`
':' is unread.
s - Tries to match 's'. No white space ignored.
%[^\n] - Everything except '\n' inputted. `\n` left unread.
The specifier IS "%[]", you don't need the "s" there.
Read the manual page for scanf()
Your format string doesn't match the input because you the "s" is not part of the specifier and it's not present in the input where the format is expecting it.
By reading the documentation in the link above, you will find out — if you don't already know — that you should also check the return value of scanf() before calling printf() or otherwise your code will invoke undefined behavior, because some of the passed pointers don't get initialized.

Comma between the two integers during the input

What exactly happens if I do the following
scanf("%d,%d", &i, &j);
and provide an input which causes the matching failure? Will it store garbage into j?
The input has to exactly match the supplied format for scanf() to be success.
Quoting C11, chapter §7.21.6.2, fsacnf(), (emphasis mine)
Except in the case of a % specifier, the input item (or, in the case of a %n directive, the
count of input characters) is converted to a type appropriate to the conversion specifier. If
the input item is not a matching sequence, the execution of the directive fails: this
condition is a matching failure. Unless assignment suppression was indicated by a *, the
result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object
does not have an appropriate type, or if the result of the conversion cannot be represented
in the object, the behavior is undefined.
and,
When all directives
have been executed, or if a directive fails (as detailed below), the function returns.
So, consolidating the above cases,
For an input like 100, 200, the scanning will be success. Both i and j will hold the given values, 100 and 200, respectively.
For an input like 100 - 200, the scanning will fail (matching failure) and the content of j will remain unchanged, i.e., j is not assigned any value by scanf() operation.
Word of advice: always check the return value of scanf() function family to ensure the success of the function call.

Why is this program not printing the input I provided? (C)

Code I have:
int main(){
char readChars[3];
puts("Enter the value of the card please:");
scanf(readChars);
printf(readChars);
printf("done");
}
All I see is:
"done"
after I enter some value to terminal and pressing Enter, why?
Edit:
Isn't the prototype for scanf:
int scanf(const char *format, ...);
So I should be able to use it with just one argument?
The actual problem is that you are passing an uninitialized array as the format to scanf().
Also you are invoking scanf() the wrong way try this
if (scanf("%2s", readChars) == 1)
printf("%s\n", readChars);
scanf() as well as printf() use a format string and that's actually the cause for the f in their name.
And yes you are able to use it with just one argument, scanf() scans input according to the format string, the format string uses special values that are matched against the input, if you don't specify at least one then scanf() will only be useful for input validation.
The following was extracted from C11 draft
7.21.6.2 The fscanf function
The format shall be a multibyte character sequence, beginning and ending in its initial shift state. The format is composed of zero or more directives: one or more white-space characters, an ordinary multibyte character (neither % nor a white-space character), or a conversion specification. Each conversion specification is introduced by the character %. After the %, the following appear in sequence:
An optional assignment-suppressing character *.
An optional decimal integer greater than zero that specifies the maximum field width
(in characters).
An optional length modifier that specifies the size of the receiving object.
A conversion specifier character that specifies the type of conversion to be applied.
as you can read above, you need to pass at least one conversion specifier, and in that case the corresponding argument to store the converted value, if you pass the conversion specifier but you don't give an argument for it, the behavior is undefined.
Yes, it is possible to call scanf with just one parameter, and it may even be useful on occasion. But it wouldn't do what you apparently thought it would. (It would just expect the characters in the argument in the input stream and skip them.) You didn't notice because you failed to do due diligence as a programmer. I'll list what you should do:
RTFM. scanf's first parameter is a format string. Plain characters which are not part of conversion sequences and are not whitespace are expected literally in the input. They are read and discarded. If they do not appear, conversion stops there, and the position in the input stream where the unexpected character occured is the start of subsequent reads. In your case probably no character was ever successfully read from the input, but you don't know for sure, because you didn't initialize the format string (see below).
Another interesting detail is scanf's return value which indicates the number items successfully read. I'll discuss that below together with the importance to check return values.
Initialize locals. C doesn't automatically initialize local data for performance reasons (in today's light one would probably enforce user initialization like other languages do, or make auto initialization a default with an opt-out possibility for the few inner loops where it would hurt). Because you didn't initialize readchars, you don't know what's in it, so you don't know what scanf expected in the input stream. On top it probably is nominally undefined behaviour. (But on your PC it shouldn't do anything unexpected.)
Check return values. scanf probably returned 0 in your example. The manual states that scanf returns the number of items successfully read, here 0, i.e. no input conversion took place. This type of undetected failure can be fatal in long sequences of read operations because the following scanfs may read in one-off indexes from a sequence of tokens, or may stall as well (and not update their pointees at all), etc.
Please bear with me -- I do not always read the manual, check return values or (by error) initialize variables for little test programs. But if it doesn't work, it's part of my investigation. And before I ask anybody, let alone the world, I make damn sure that I have done my best to find out what I did wrong, beforehand.
You're not using scanf correctly:
scanf(formatstring, address_of_destination,...)
is the right way to do it.
EDIT:
Isn't the prototype for scanf:
int scanf(const char *format, ...);
So I should be able to use it with just one argument?
No, you should not. Please read documentation on scanf; format is a string specifying what scanf should read, and the ... are the things that scanf should read into.
The first argument to scanf is the format string. What you need is:
scanf("%2s", readChars);
It Should provided Format specifiers in scanf function
char readChars[3];
puts("Enter the value of the card please:");
scanf("%s",readChars);
printf("%s",readChars);
printf("done");
http://www.cplusplus.com/reference/cstdio/scanf/ more info...

Is scanf guaranteed to not change the value on failure?

If a scanf family function fails to match the current specifier, is it permitted to write to the storage where it would have stored the value on success?
On my system the following outputs 213 twice but is that guaranteed?
The language in the standard (C99 or C11) does not seem to clearly specify that the original value should remain unchanged (whether it was indeterminate or not).
#include <stdio.h>
int main()
{
int d = 213;
// matching failure
sscanf("foo", "%d", &d);
printf("%d\n", d);
// input failure
sscanf("", "%d", &d);
printf("%d\n", d);
}
The relevant part of the C11 standard is (7.21.6.2, for fscanf):
7 A directive that is a conversion specification defines a set of matching input sequences, as described below for each specifier. A conversion specification is executed in the following steps:
8 […]
9 An input item is read from the stream, unless the specification includes an n specifier. An input item is defined as the longest sequence of input characters which does not exceed any specified field width and which is, or is a prefix of, a matching input sequence.285) The first character, if any, after the input item remains unread. If the length of the input item is zero, the execution of the directive fails; this condition is a matching failure unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the count of input characters) is converted to a type appropriate to the conversion specifier. If the input item is not a matching sequence, the execution of the directive fails: this condition is a matching failure. Unless assignment suppression was indicated by a *, the result of the conversion is placed in the object pointed to by the first argument following the format argument that has not already received a conversion result. […]
To me, the words “step” and “If the length of the input item is zero, the execution of the directive fail” indicate that if the input does not match a specifier in the format, interpretation stops before any assignment for that specifier has occurred.
On the other hand, the subclause 4 about the ones quoted makes it clear that specifiers up to the failing one are assigned, again using language appropriate for ordered sequences of events:
4 The fscanf function executes each directive of the format in turn. When all directives have been executed, or if a directive fails (as detailed below), the function returns.
Judging from ISO/IEC 9899:2011 §7.21.6.2 The fscanf function:
¶10 Except in the case of a % specifier, the input item (or, in the case of a %n directive, the
count of input characters) is converted to a type appropriate to the conversion specifier. If
the input item is not a matching sequence, the execution of the directive fails: this
condition is a matching failure. Unless assignment suppression was indicated by a *, the
result of the conversion is placed in the object pointed to by the first argument following
the format argument that has not already received a conversion result. If this object
does not have an appropriate type, or if the result of the conversion cannot be represented
in the object, the behavior is undefined.
In the larger context, this seems to mean that the assignment to the target variable only occurs after the conversion is successful. For numeric types, that makes sense and is readily achievable. For string types, it is not so clear cut, but it should work the same way (the text quoted does state that the assignment only occurs if there is no matching failure or input failure). However, if there is an encoding error part way through a string (%s or %30c or %[a-z]), it would not be surprising to find that the first part of the string is changed even though the conversion as a whole failed. This could probably be regarded as a bug. Stimulating the bug accurately might be hard; for example, it might require UTF-8 input and an invalid byte such as 0xC0 or 0xF5 in the input stream.

Characters written so far in snprintf

Lately, I noticed a strange case I would like to verify:
By SUS, for %n in a format string, the respective int will be set to the-amount-of-bytes-written-to-the-output.
Additionally, for snprintf(dest, 3, "abcd"), dest will point to "ab\0". Why? Because no more than n (n = 3) bytes are to be written to the output (the dest buffer).
I deduced that for the code:
int written;
char dest[3];
snprintf(dest, 3, "abcde%n", &written);
written will be set to 2 (null termination excluded from count).
But from a test I made using GCC 4.8.1, written was set to 5.
Was I misinterpreting the standard? Is it a bug? Is it undefined behaviour?
Edit:
#wildplasser said:
... the behavior of %n in the format string could be undefined or implementation defined ...
And
... the implementation has to simulate processing the complete format string (including the %n) ...
#par said:
written is 5 because that's how many characters would be written at the point the %n is encountered. This is correct behavior. snprintf only copies up to size characters minus the trailing null ...
And:
Another way to look at this is that the %n wouldn't have even been encountered if it only processed up to 2 characters, so it's conceivable to expect written to have an invalid value...
And:
... the whole string is processed via printf() rules, then the max-length is applied ...
Can it be verified be the standard, a standard-draft or some official source?
written is 5 because that's how many characters would be written at the point the %n is encountered. This is correct behavior. snprintf only copies up to size characters minus the trailing null (so in your case 3-1 == 2. You have to separate the string formatting behavior from the only-write-so-many characters.
Another way to look at this is that the %n wouldn't have even been encountered if it only processed up to 2 characters, so it's conceivable to expect written to have an invalid value. That's where there would be a bug, if you were expecting something valid in written at the point %n was encountered (and there wasn't).
So remember, the whole string is processed via printf() rules, then the max-length is applied.
It is not a bug: ISOC99 says
The snprintf function is equivalent to fprintf
[...]
output characters beyond the n-1st are discarded rather than being written to the array
[...]
So it just discards trailing output but otherwise behaves the same.

Resources