How to use a scanf width specifier of 0? - c

How to use a scanf width specifier of 0?
1) unrestricted width (as seen with cywin gcc version 4.5.3)
2) UB
3) something else?
My application (not shown) dynamically forms the width specifier as part of a larger format string for scanf(). Rarely it would create a "%0s" in the middle of the format string. In this context, the destination string for that %0s has just 1 byte of room for scanf() to store a \0 which with behavior #1 above causes problems.
Note: The following test cases use constant formats.
#include <memory.h>
#include <stdio.h>
void scanf_test(const char *Src, const char *Format) {
char Dest[10];
int NumFields;
memset(Dest, '\0', sizeof(Dest)-1);
NumFields = sscanf(Src, Format, Dest);
printf("scanf:%d Src:'%s' Format:'%s' Dest:'%s'\n", NumFields, Src, Format, Dest);
}
int main(int argc, char *argv[]) {
scanf_test("1234" , "%s");
scanf_test("1234" , "%2s");
scanf_test("1234" , "%1s");
scanf_test("1234" , "%0s");
return 0;
}
Output:
scanf:1 Src:'1234' Format:'%s' Dest:'1234'
scanf:1 Src:'1234' Format:'%2s' Dest:'12'
scanf:1 Src:'1234' Format:'%1s' Dest:'1'
scanf:1 Src:'1234' Format:'%0s' Dest:'1234'
My question is about the last line. It seems that a 0 width results in no width limitation rather than a width of 0. If this is correct behavior or UB, I'll have to approach the zero width situation another way or are there other scanf() formats to consider?

The maximum field width specifier must be non-zero. C99, 7.19.6.2:
The format shall be a multibyte character sequence, beginning and ending in its initial
shift state. The format is composed of zero or more directives: one or more white-space
characters, an ordinary multibyte character (neither % nor a white-space character), or a
conversion specification. Each conversion specification is introduced by the character %.
After the %, the following appear in sequence:
— An optional assignment-suppressing character *.
— An optional nonzero decimal integer that specifies the maximum field width (in
characters).
— An optional length modifier that specifies the size of the receiving object.
— A conversion specifier character that specifies the type of conversion to be applied.
So, if you use 0, the behavior is undefined.

This came from 7.21.6.2 of n1570.pdf (C11 standard draft):
After the %, the following appear in sequence:
— An optional assignment-suppressing character *.
— An optional decimal integer greater than zero that specifies the
maximum field width (in characters).
...
It's undefined behaviour, because the C standard states that your maximum field width must be greater than zero.
An input item is defined as the longest sequence of input characters
which does not exceed any specified field width and ...
What is it you wish to achieve by reading a field of width 0 and assigning it as a string (empty string) into Dest? Which actual problem are you trying to solve? It seems more clear to just assign like *Dest = '\0';.

Related

towlower() return "?" instead of this character in lower register

it is simple version just for ask, i have this program
wchar_t c;
wprintf(L"input\n");
wscanf(L"%d", &c);
wprintf(L"output\n");
wprintf(L"%lc", towlower(c));
and this input/output
if i input "W" there output "?", with another characters i have the same situation.
in the line wscanf(L"%d", &c);, you passed %d as format specifier. So wscanf() is searching for an integer, but you are passing a character instead. Changing to %c will solve it.
See the specification in the wscanf(3) manpage:
d Matches an optionally signed decimal integer, whose format is the same
as expected for the subject sequence of wcstol() with the value 10 for
the base argument. In the absence of a size modifier, the application
shall ensure that the corresponding argument is a pointer to int.
...
c Matches a sequence of wide characters of exactly the number specified
by the field width (1 if no field width is present in the conversion
specification).

Why the second scanf also changes the variate presented in the first scanf?

I want to input an integer number and a character with scanf funtion, but it didn't work as I want.
The codes are as follows.
#include <stdio.h>
int main()
{
int a;
char c;
scanf("%d",&a);
scanf("%2c",&c);
printf("%d%c",a,c);
return 0;
}
I tried to input 12a (there is a space after a) from the terminal, but the output is not "12a" but "32a".
I also tried to run the code above step by step and found that when it run into the first "scanf", the value of "a" is 12, but when run into second "scanf", the value of "a" turned 32.
I want to figure out why the second scanf changes the value of a, which is not presented.
The problem is that the compiler has put variable a just behind variable c. When you do the second scanf() you specify to read two characters into a variable that has space only for one. You have incurred in a buffer overflow, and have overwritten memory past the variable c (and a happens to be there). The space has been written into a and this is the reason that you get 32 output (a has been stored the value of an ASCII SPACE, wich is 32).
What has happened is known as Undefined Behaviour, and it's common when you make this kind of mistakes. You can solve this by definning an array of char cells with at least two cells for reading the two characters . and then use something like:
#include <stdio.h>
int main()
{
int a;
char c[2];
scanf("%d", &a);
scanf("%2c", c); /* now c is a char array so don't use & */
printf("%d%.2s", a, c); /* use %.2s format instead */
return 0;
}
Note:
the use of %.2s format specifier is due to the fact that c is an array of two chars that has been filled completely (without allowing space to include a \0 string end delimiter) this would cause undefined behaviour if we don't ensure that the formatting will end at the second character (or before, in case a true \0 is found in the first or the second array positions)
Quoting C11, chapter 7.21.6.2, The fscanf function (emphasis mine)
c
[...]If an l length modifier is present, the input shall be a sequence of multibyte characters that begins in the initial shift state. Each multibyte character in the sequence is converted to a wide character as if by a call to the mbrtowc function, with the conversion state described by an mbstate_t object initialized to zero before the first multibyte character is converted. The corresponding argument shall be a pointer to the initial element of an array of wchar_t large enough to accept the resulting sequence of wide characters. [...]
and you're supplying a char *. The supplied argument does not match the expected type of argument, so this is undefined behavior.
Therefore the outcome cannot be justified.
To hold an input like "a ", you'll need a (long enough) char array, a char variable is not sufficient.

The scanf 'maximum field width' includes whitespace?

Suppose we have
int n;
sscanf(" 42", "%2d", &n);
Should n be 4 (the whitespace accounted for by the "%2d") or 42 (whitespace ignored, making scanf read 3 characters)?
ideone implementation reads 3 characters
The POSIX specification for sscanf()
is fairly clear about the processing:
The format is a character string, … composed of zero or more directives. Each directive is composed of one of the following: one or more white-space characters (<space>, <tab>, <newline>, <vertical-tab>, or <form-feed>); an ordinary character (neither '%' nor a white-space character); or a conversion specification. Each conversion specification is introduced by the character '%' [CX] ⌦ or the character sequence "%n$", ⌫ after which the following appear in sequence:
…
A directive that is a conversion specification defines a set of matching input sequences, as described below for each conversion character. A conversion specification shall be executed in the following steps.
Input white-space characters (as specified by isspace) shall be skipped, unless the conversion specification includes a [, c, C, or n conversion specifier.
An item shall be read from the input, unless the conversion specification includes an n conversion specifier. An input item shall be defined as the longest sequence of input bytes (up to any specified maximum field width, which may be measured in characters or bytes dependent on the conversion specifier) which is an initial subsequence of a matching sequence. The first byte, if any, after the input item shall remain unread. If the length of the input item is 0, the execution of the conversion specification shall fail; this condition is a matching failure, unless end-of-file, an encoding error, or a read error prevented input from the stream, in which case it is an input failure.
If white space is skipped by a conversion specification (%…), it is not counted as part of the field width; the skipping occurs before any counting does.
The equivalent specification in C11 §7.21.6.2 The fscanf function is very similar (but it doesn't include the 'C extension' markup, of course).
The scanf 'maximum field width' includes whitespace?
Yes for [ and c.
No for other specifiers.
"%n" does not apply.
The fscanf() (C11dr §7.21.6.2 7-9)
7 ... A conversion specification is executed in the following steps:
8 Input white-space characters (as specified by the isspace function) are skipped, unless the specification includes a [, c, or n specifier.
9 An input item is read from the stream, ... An input item is defined as the longest sequence of input characters which does not exceed any specified field width and ....
The width applies after leading input white-space character consumption.
Further, as I read the spec, if the conversion fails, the input white-space characters remained consumed.
From the BSD manual page:
In addition to these flags, there may be an optional maximum field width,
expressed as a decimal integer, between the % and the conversion. If no width is
given, a default of ``infinity'' is used (with one exception, below); otherwise
at most this many bytes are scanned in processing the conversion. In the case of
the lc, ls and l[ conversions, the field width specifies the maximum number of
multibyte characters that will be scanned. Before conversion begins, most conversions skip white space; this white space is not counted against the field
width.
The Linux man page has
An optional decimal integer which specifies the maximum field width. Reading
of characters stops either when this maximum is reached or when a nonmatching
character is found, whichever happens first. Most conversions discard initial
white space characters (the exceptions are noted below), and these discarded
characters don't count toward the maximum field width. String input conversions store a terminating null byte ('\0') to mark the end of the input; the
maximum field width does not include this terminator.
both specify that the whitespace does not count against the field width.

Why does pushing space bar not put a value in array?

#include <stdio.h>
int main(){
printf("Enter 10 numbers: ");
int a[10], i = 0;
for (i = 0; i < 10; i++)
{
scanf("%d", &a[i]);
}
}
When I put value in each array, why pushing the space bar can put a value in the array?
For example when I write 1space2space3space then each value is put in each array (a[0], a[1], a[2]).
Why is this happening?
From the C Standard (7.21.6.2 The fscanf function)
12 The conversion specifiers and their meanings are:
d Matches an optionally signed decimal integer, whose format is the same as
expected for the subject sequence of the strtol function with the value 10
for the base argument. The corresponding argument shall be a pointer to
signed integer.
And (7.22.1.4 The strtol, strtoll, strtoul, and strtoull functions)
...First, they decompose the input string into three parts: an initial, possibly empty, sequence of white-space characters (as
specified by the isspace function), a subject sequence resembling an
integer represented in some radix determined by the value of base, and
a final string of one or more unrecognized characters, including the
terminating null character of the input string. Then, they attempt to
convert the subject sequence to an integer, and return the result.
For such an input
1space2space3space
the first subject sequence is 1, the second subject sequence (after skipping white-space characters) is 2, and the third subject sequence is 3. They are used to store integers correspondingly in a[0], a[1], and a[2] because each subject sequence represents a valid integer.
Take into account that in general implementations use the so-called line buffering for text streams.
From the C Standard (7.21.3 Files)
... When a stream is line buffered, characters are intended to be
transmitted to or from the host environment as a block when a
new-line character is encountered.
As you are using scanf() with %d, the input stream is accepting integer values, and considering others as separator.

Program doesn't print character when I enter a string

So I have this code:
#include <stdio.h>
int main() {
char B,y[2];
scanf("%c",&B);
scanf("%s",y);
printf("%c\n",B);
}
When I enter in a character for B like S, then a character for y like a, it works fine.
It prints out
a
S
However, when I enter 2 characters for y like ab, it prints the two characters but doesn't print out S.
It prints:
ab
Am I doing something wrong?
First of all, a char array, defined like y[2] can hold only one char and the other space is reserved for terminating null for that array to behave as string. In other words, the max length of the string it can hold is only 1.
That said, as per the understanding, you should change
scanf("%s",y);
to
scanf("%1s",y);
to limit the input length. Otherwise, you'll experience buffer overflow which invokes undefined behavior.
To elaborate on adding that literal 1 in the format string, that 1 denotes the maximum field width.
Quoting C11, chapter §7.21.6.2, fscanf(), (emphasis mine)
An input item is read from the stream, unless the specification includes an n specifier. An
input item is defined as the longest sequence of input characters which does not exceed
any specified field width and which is, or is a prefix of, a matching input sequence. [....]

Resources