How to use "zd" specifier with `printf()`? - c

Looking for clarification on using "zd" with printf().
Certainly the following is correct with C99 and later.
void print_size(size_t sz) {
printf("%zu\n", sz);
}
The C spec seems to allow printf("%zd\n", sz) depending on how it is read:
7.21.6.1 The fprintf function
z Specifies that a following d, i, o, u, x, or X conversion specifier applies to a size_t or the corresponding signed integer type argument; or that a following n conversion specifier applies to a pointer to a signed integer type corresponding to size_t argument. C11dr §7.21.6.1 7
Should this be read as
"z Specifies that a following d ... conversion specifier applies to a size_t or the corresponding signed integer type argument ... "(both types) and "z Specifies that a following u ... conversion specifier applies to a size_t or the corresponding signed integer type argument ..." (both types)
OR
"z Specifies that a following d ... conversion specifier applies to a corresponding signed integer type argument ..." (signed type only) and "z Specifies that a following u ... conversion specifier applies to a size_t" (unsigned type only).
I've been using the #2 definition, but now not so sure.
Which is correct, 1, 2, or something else?
If #2 is correct, what is an example of a type that can use "%zd"?

printf with a "%zd" format expects an argument of the signed type that corresponds to the unsigned type size_t.
Standard C doesn't provide a name for this type or a good way to determine what it is. If size_t is a typedef for unsigned long, for example, then "%zd" expects an argument of type long, but that's not a portable assumption.
The standard requires that corresponding signed and unsigned types use the same representation for the non-negative values that are representable in both types. A footnote says that this is meant to imply that they're interchangeable as function arguments. So this:
size_t s = 42;
printf("s = %zd\n", s);
should work, and should print "42". It will interpret the value 42, of the unsigned type size_t, as if it were of the corresponding signed type. But there's really no good reason to do that, since "%zu" is also correct and well defined, without resorting to additional language rules. And "%zu" works for all values of type size_t, including those outside the range of the corresponding signed type.
Finally, POSIX defines a type ssize_t in the headers <unistd.h> and <sys/types.h>. Though POSIX doesn't explicitly say so, it's likely that ssize_t will be the signed type corresponding to size_t.
So if you're writing POSIX-specific code, "%zd" is (probably) the correct format for printing values of type ssize_t.
UPDATE: POSIX explicitly says that ssize_t isn't necessarily the signed version of size_t, so it's unwise to write code that assumes that it is:
ssize_t
This is intended to be a signed analog of size_t. The wording is such
that an implementation may either choose to use a longer type or
simply to use the signed version of the type that underlies size_t.
All functions that return ssize_t (read() and write()) describe as
"implementation-defined" the result of an input exceeding {SSIZE_MAX}.
It is recognized that some implementations might have ints that are
smaller than size_t. A conforming application would be constrained not
to perform I/O in pieces larger than {SSIZE_MAX}, but a conforming
application using extensions would be able to use the full range if
the implementation provided an extended range, while still having a
single type-compatible interface. The symbols size_t and ssize_t are
also required in <unistd.h> to minimize the changes needed for calls
to read() and write(). Implementors are reminded that it must be
possible to include both <sys/types.h> and <unistd.h> in the same
program (in either order) without error.

According to the little test I have done, "zd" is always true ,but "zu" don't work for negative numbers.
Test Code:
#include <stdio.h>
int main (void)
{ int i;
size_t uzs = 1;
ssize_t zs = -1;
for ( i= 0; i<5 ;i++, uzs <<= 16,zs <<= 16 )
{
printf ("%zu\t", uzs); /*true*/
printf ("%zd\t", uzs); /*true*/
printf ("%zu\t", zs); /* false*/
printf ("%zd\n", zs); /*true*/
}
return 0;
}

Related

How do I construct a "signed size_t" for use with scanf("%zn")?

I tried to obtain the number of characters read as a size_t, using this program:¹
#include <stdio.h>
int main(void)
{
size_t i;
sscanf("abc", "%*s%zn", &i);
printf("%zu", i);
}
GCC 12 warns about this:
scanf.c: In function ‘main’:
scanf.c:7:25: warning: format ‘%zn’ expects argument of type ‘signed size_t *’, but argument 3 has type ‘size_t *’ {aka ‘long unsigned int *’} [-Wformat=]
7 | sscanf("abc", "%*s%zn", &i);
| ~~^ ~~
| | |
| | size_t * {aka long unsigned int *}
| long int *
| %ln
And it's correct² do do so; in the C17 draft standard, page 234, we read (my emphasis)
No input is consumed. The corresponding argument shall be a pointer to signed integer into which is to be written the number of characters read from the input stream so far by this call to the fscanf function.
Earlier standards contain similar wording.
So how do I (portably) create a signed equivalent of size_t for this conversion?
In C++, I could use std::make_signed_t<std::size_t>, but that's obviously not an option for C code. Without that, it seems that %zn conversion is unusable in C.
¹ The real-world case from which this arises came from reviewing Simple photomosaic generator, where we wanted a more general form of strto𝑥(), and so need %n to determine the end of conversion. I know we can use plain int for all 𝑥 here, but wanted to check the expected behaviour before reporting as a bug.
² Other than calling the required type signed size_t *, which is obviously not a valid C type name.
How do I construct a "signed size_t" for use with scanf("%zn")?
You are out of luck. As of C17
z Specifies that a following d, i, o, u, x, X, or n conversion specifier applies to an argument with type pointer to size_t or the corresponding signed integer type. C17dr § 7.21.6.2 11
n No input is consumed. The corresponding argument shall be a pointer to signed integer .... C17dr § 7.21.6.2 12
And size_t is an unsigned type.
Yet C never details how to make the corresponding signed integer type.
There is no specified signed type corresponding to size_t in standard C.
Alternatives:
Live with "%n", &int_object and then size_t i = (unsigned) int_object;. (Cast important)
Use "%jn", &intmax_t_object and then size_t i = (size_t) intmax_t_object;.
If pressed to typedef a signed_size_t, the following should portably work, but you are on your own.
#include <assert.h>
#include <inttypes.h>
#include <limits.h>
#include <stddef.h>
#include <stdint.h>
#if SIZE_MAX == UINT_MAX
typedef int signed_size_t;
#elif SIZE_MAX == ULONG_MAX
typedef long signed_size_t;
#elif SIZE_MAX == ULLONG_MAX
typedef long long signed_size_t;
#elif SIZE_MAX == UINTMAX_MAX;
typedef intmax_t signed_size_t;
#elif SIZE_MAX == USHRT_MAX
typedef short signed_size_t;
#else
#error Strange `size_t`.
#endif
_Static_assert(sizeof(size_t) == sizeof(signed_size_t), "Strange size_t");
About non-standard ssize_t
ssize_t is not a certain match for the corresponding signed integer type.
Even The Open Group Base Specifications Issue 7, 2018 edition has:
ssize_t
This is intended to be a signed analog of size_t. The wording is such that an implementation may either choose to use a longer type or simply to use the signed version of the type that underlies size_t. ...
Similar question How to use "zd" specifier with printf()?.
"signed size_t" doesn't really exist as a standard-defined type, in either POSIX or any version of the C standard.
POSIX ssize_t is specified as "The type ssize_t shall be capable of storing values at least in the range [-1, {SSIZE_MAX}].", so even if ssize_t is available, it's not strictly suitable for use as a "signed size_t" value.
While there's no type that's guaranteed to be large enough to hold a "signed size_t" type of value in strictly-conforming code, on all platforms that I'm aware of, the standard type long long int seems to me to be the most "future proof". (The odds of size_t being larger than unsigned long long int are IMO likely somewhere between zero and infinitesimal, but I don't think that's precluded by any version of the C standard.)
As I write this, no platform that I'm aware of uses a size_t value larger than 64 bits, and long long int is guaranteed to be at least 64 bits, so it will be large enough.
int64_t seems like it would be a good choice, but should a system ever have a size_t larger than 64 bits, int64_t would fail. I suspect that any system implementation with size_t larger than 64 bits would likely expand long long int to match.
You could always add something like this to your code to protect yourself
#if SIZE_MAX > ULLONG_MAX
#error size_t larger than unsigned long long int
#endif
Nevermind it's easier to use the " %lld ..." format specifier to scan a long long int than it is to use the " " SCNd64 " ..." macro to scan an int64_t...
If you really want to create a "matching signed size_t" type, you could expand on using SIZE_MAX in a #if ladder, comparing it to various U*MAX values to find the "most correct" matching signed integer type to use in a typedef. But then you'd have to create your own scanf() (and maybe printf() macros).
OR...
Or you could just pedantically play with fire and use ssize_t on POSIX systems, and create your own ssize_t on non-POSIX systems. IME there are no ssize_t implementations that don't de facto act as a "signed size_t" value.

Is it UB to give a char argument to printf where printf expects a int?

Do I understand the standard correctly that this program cause UB:
#include <stdio.h>
int main(void)
{
char a = 'A';
printf("%c\n", a);
return 0;
}
When it is executed on a system where sizeof(int)==1 && CHAR_MIN==0?
Because if a is unsigned and has the same size (1) as an int, it will be promoted to an unsigned int [1] (2), and not to an int, since a int can not represent all values of a char. The format specifier "%c" expects an int [2] and using the wrong signedness in printf() causes UB [3].
Relevant quotes from ISO/IEC 9899 for C99
[1] Promotion to int according to C99 6.3.1.1:2:
If an int can represent all values of the original type, the value is
converted to an int; otherwise, it is converted to an unsigned int.
These are called the integer promotions. All other types are
unchanged by the integer promotions.
[2] The format specifier "%c" expects an int argument, C99 7.19.6.1:8 c:
If no l length modifier is present, the int argument is converted to
an unsigned char, and the resulting character is written.
[3] Using the wrong type in fprintf() (3), including wrong signedness, causes UB according to C99 7.19.6.1:9:
... If any argument is not the correct type for the corresponding
conversion specification, the behavior is undefined.
The exception for same type with different signedness is given for the va_arg macro but not for printf() and there is no requirement that printf() uses va_arg (4).
Footnotes:
(marked with (n))
This implies INT_MAX==SCHAR_MAX, because char has no padding.
See also this question: Is unsigned char always promoted to int?
The same rules are applied to printf(), see C99 7.19.6.3:2
See also this question: Does printf("%x",1) invoke undefined behavior?
A program can have undefined behavior or not depending on the characteristics of the implementation.
For example, a program that executes
int x = 32767;
x++;
(and is otherwise well defined) has well defined behavior on an implementation with INT_MAX > 32767, and undefined behavior otherwise.
Your program:
#include <stdio.h>
int main(void)
{
char a='A';
printf("%c\n",a);
return 0;
}
has well defined behavior for any hosted implementation with INT_MAX >= CHAR_MAX. On any such implementation, the value of 'A' is promoted to int, which is what %c expects.
If INT_MAX < CHAR_MAX (which implies that plain char is unsigned and that CHAR_BIT >= 16), the value of a is promoted to unsigned int. N1570 7.21.6.1p9:
If any argument is not the correct type for the corresponding
conversion specification, the behavior is undefined.
implies that this has undefined behavior.
In practice, (a) such implementations are rare, likely nonexistent (the only existing C implementations I've heard of with CHAR_BIT > 8 are for DSPs and such implementations are likely to be freestanding), and (b) any such implementation would probably be designed to handle such cases gracefully.
TL;DR there is no UB (in my interpretation at any rate).
6.2.5 types
6. For each of the signed integer types, there is a corresponding (but different) unsigned integer type (designated with the keyword unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
9. The range of nonnegative values of a signed integer type is a subrange of the corresponding unsigned integer type, and the representation of the same value in each type is the same 41)
41) The same representation and alignment requirements are meant to imply interchangeability as arguments to functions, return values from functions, and members of unions.
Furthermore
7.16.1.1 The va_arg macro
2 The va_arg macro expands to an expression that has the specified type and the value of the next argument in the call. [...] If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:
one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
7.21.6.8 The vfprintf function
288) [...] functions vfprintf, vfscanf, vprintf, vscanf, vsnprintf, vsprintf, and vsscanf invoke the va_arg macro [...]
Thus, it stands to reason that an unsigned type is not "an incorrect type for the corresponding (signed) conversion specification", as long as the value is within the range.
This is corroborated by the fact that major compilers do not warn about signed/unsigned format specification mismatch, even though they do warn about other mismatches, even when the corresponding types have the same representation on a given platform (e.g. long and long long).
Do i understand the standard correct that this program cause UB:
#include <stdio.h>
int main(void)
{
char a='A';
printf("%c\n",a);
return 0;
}
When it is executed on a system where sizeof(int)==1 && CHAR_MIN==0?
That would be a plausible interpretation of the standard. However, in the event that an implementation with such a combination of type characteristics were produced for genuine use, I have full confidence that it would provide appropriate support for the %c directive -- as an extension, if one wants to interpret it that way. The example program would then have well-defined behavior with respect to that implementation, whether or not the C standard is interpreted to define that behavior, too. I suppose I account that quality-of-implementation issue as being rolled up in "for genuine use".

Is it illegal to use the h or hh length modifiers when the corresponding argument to printf was not a short / char?

The printf family of functions provide a series of length modifiers, two of them being hh (denoting a signed char or unsigned char argument promoted to int) and h (denoting a signed short or unsigned short argument promoted to int). Historically, these length modifiers have only been introduced to create symmetry with the length modifiers of scanf and are rarely used for printf.
Here is an excerpt of ISO 9899:2011 §7.21.6.1 “The fprintf function” ¶7:
7 The length modifiers and their meanings are:
hh Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); or that a following n conversion specifier applies to a pointer to a signed char
argument.
h Specifies that a following d, i, o, u, x, or X conversion specifier applies to a short int or unsigned short intargument (the argument will have been promoted according to the integer promotions, but its value shall be converted to short int or unsigned short int before printing); or that a following n conversion specifier applies to a pointer to a short int argument.
...
Ignoring the case of the n conversion specifier, what do these almost identical paragraphs say about the behaviour of h and hh?
In this answer, it is claimed that passing an argument that is outside the range of a signed char, signed short, unsigned char, or unsigned short resp. for a conversion specification with an h or hh length modifier resp. is undefined behaviour, as the argument wasn't converted from type char, short, etc. resp. before.
I claim that the function operates in a well-defined manner for every value of type int and that printf behaves as if the parameter was converted to char, short, etc. resp. before conversion.
One could also claim that invoking the function with an argument that was not of the corresponding type before default argument promotion is undefined behaviour, but this seems abstruse.
Which of these three interpretations of §7.21.6.1¶7 (if at all) is correct?
The standard specifies:
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
[C2011 7.21.6.1/9]
What is meant by "the correct type", is conceivably open to interpretation, but the most plausible interpretation to me is the type that the conversion specification "applies to" as specified earlier in the same section, and as quoted, in part, in the question. I take the parenthetical comments about argument promotion to be acknowledging the ordinary argument-passing rules, and avoiding any implication of these functions being special cases. I do not take the parenthetic comments as relevant to determining the "correct type" of the argument.
What actually happens if you pass an argument of wider type than is correct for the conversion specification is a different question. I am inclined to believe that the C system is unlikely to be implemented by anybody such that it makes a difference whether a printf() argument is actually a char, or whether it is an int whose value is in the range of char. I assert, however, that it is valid behavior for the compiler to check argument type correspondence with the format, and to reject the program if there is a mismatch (because the required behavior in such a case is explicitly undefined).
On the other hand, I could certainly imagine printf() implementations that actually misbehave (print garbage, corrupt memory, eat your lunch) if the value of an argument is outside the range implied by the corresponding conversion specifier. This also is permissible on account of the behavior being undefined.

Format specifier for unsigned char

Say I want to print unsigned char:
unsigned char x = 12;
which is correct. This:
printf("%d",x);
or this:
printf("%u",x);
?
The thing is elsewhere on SO I encountered such discussion:
-Even with ch changed to unsigned char, the behavior of the code is not defined by the C standard. This is because the unsigned char is promoted to an int (in normal C implementations), so an int is passed to printf for the specifier %u. However, %u expects an unsigned int, so the types do not match, and the C standard does not define the behavior
-Your comment is incorrect. The C11 standard states that the conversion specifier must be of the same type as the function argument itself, not the promoted type. This point is also specifically addressed in the description of the hh length modifier: "the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing"
So which is correct? Any reliable source saying on this matter? (In that sense we should also print unsigned short int with %d because it can be promoted to int?).
The correct one is*:
printf("%d",x);
This is because of default argument promotions as printf() is variadic function. This means that unsigned char value is always promoted to int.
From N1570 (C11 draft) 6.5.2.2/6 Function calls (emphasis mine going forward):
If the expression that denotes the called function has a type that
does not include a prototype, the integer promotions are performed on
each argument, and arguments that have type float are promoted to
double. These are called the default argument promotions.
and 6.5.2.2/7 subclause tells:
The ellipsis notation in a function prototype declarator causes
argument type conversion to stop after the last declared parameter.
The default argument promotions are performed on trailing arguments.
These integer promotions are defined in 6.3.1.1/2 Boolean, characters, and integers:
If an int can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an int;
otherwise, it is converted to an unsigned int. These are called the
integer promotions.58) All other types are unchanged by the integer
promotions.
This quote answers your second question of unsigned short (see comment below).
* with exception to more than 8 bits unsigned char (e.g. it might occupy 16 bit), see #chux's answer.
Correct format specifier for unsigned char x = 12 depends on a number of things:
If INT_MAX >= UCHAR_MAX, which is often the case, use "%d". In this case an unsigned char is promoted to int.
printf("%d",x);
Otherwise use "%u" (or "%x", "%o"). In this case an unsigned char is promoted to unsigned.
printf("%u",x);
Up-to-date compilers support the "hh" length modifier, which compensates for this ambiguity. Shouldx get promoted to int or unsigned due to the standard promotions of variadic parameters, printf() converts it to unsigned char before printing.
printf("%hhu",x);
If dealing with an old compiler without "hh" or seeking highly portable code, use explicit casting
printf("%u", (unsigned) x);
The same issue/answer applies to unsigned short, expect INT_MAX >= USHRT_MAX and use "h" instead of "hh".
For cross platform development, I typically bypass the promoting issue by using inttypes.h
http://pubs.opengroup.org/onlinepubs/009695399/basedefs/inttypes.h.html
This header (which is in the C99 standard) defines all the printf types for the basic types. So if you want an uint8_t (a syntax which I highly suggest using instead of unsigned char) I would use
#include <inttypes.h>
#include <stdint.h>
uint8_t x;
printf("%" PRIu8 "\n",x);
Both, unsigned char and unsigned short, can always safely be printed with %u. Default argument promotions convert them either to int or to unsigned int. If they are promoted to the latter, everything is fine (the format specifier and the type passed match), otherwise C11 (n1570) 6.5.2.2 p6, first bullet, applies:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
The standard is quite clear that default argument promotions apply to the variadic arguments of printf, e.g. it's mentioned again for the (mostly useless) h and hh length modifiers (ibid. 7.21.6.1 p7, emph. mine):
hh -- Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing); [...]

Is the %c fprintf specifier required to take an int argument

In section 7.19.6.1 paragraph 8 of the C99 standard:
c If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
In section 7.19.6.1 paragraph 9 of the C99 standard:
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
Does the fprintf function require an int argument?
For example, would passing an unsigned int result in undefined behavior:
unsigned int foo = 42;
fprintf(fp, "%c\n", foo); /* undefined behavior? */
This worries me since an implementation could have defined char as having the same behavior as unsigned char (section 6.2.5 paragraph 15).
For these cases integer promotion may dictate that the char to be promoted to unsigned int on some implementations. Thus leaving the following code to risk undefined behavior on those implementations:
char bar = 'B';
fprintf(fp, "%c\n", bar); /* possible undefined behavior? */
Are int variables and literal int constants the only safe way to pass a value to fprintf with the %c specifier?
%c conversion specification for fprintf requires an int argument. The value has to be of type int after the default argument promotions.
unsigned int foo = 42;
fprintf(fp, "%c\n", foo);
undefined behavior: foo has to be an int.
char bar = 'B';
fprintf(fp, "%c\n", bar);
not undefined behavior:bar is promoted (default argument promotions) to int as fprintf is a variadic function.
EDIT: to be fair, there are still some very rare implementations where it can be undefined behavior. For example, if char is an unsigned type with not all char values representable in an int (like in this implementation), the default argument promotion is done to unsigned int.
Yes, printf with "%c" requires an int argument -- more or less.
If the argument is of a type narrower than int, then it will be promoted. In most cases, the promotion is to int, with well defined behavior. In the very rare case that plain char is unsigned and sizeof (int) == 1 (which implies CHAR_BIT >= 16), a char argument is promoted to unsigned int, which can cause undefined behavior.
A character constant is already of type int, so printf("%c", 'x') is well defined even on exotic systems. (Off-topic: In C++, character constants are of type char.)
This:
unsigned int foo = 42;
fprintf(fp, "%c\n", foo);
strictly speaking has undefined behavior. N1570 7.1.4p1 says:
If an argument to a function has ... a type (after promotion) not
expected by a function with variable number of arguments, the behavior
is undefined.
and the fprintf call clearly runs afoul of that. (Thanks to ouah for pointing that out.)
On the other hand, 6.2.5p6 says:
For each of the signed integer types, there is a corresponding (but
different) unsigned integer type (designated with the keyword
unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
and 6.2.5p9 says:
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same.
with a footnote:
The same representation and alignment requirements are meant to imply
interchangeability as arguments to functions, return values from
functions, and members of unions.
The footnote says that function arguments of types int and unsigned int are interchangeable, as long as the value is within the representable range of both types. (For a typical 32-bit system, that means the value has to be in the range 0 to 231-1; int values from -231 to -1, and unsigned int values from 231 to 232-1, are outside the range of the other type, and are not interchangeable.)
But footnotes in the C standard are non-normative. They are generally intended to clarify requirements stated in the normative text, not to impose new requirements. But the normative text here merely states that corresponding signed and unsigned types have the same representation, which doesn't necessarily imply that they're passed the same way as function arguments. In principle, a compiler could ignore that footnote and, for example, pass int and unsigned int arguments in different registers, making fprintf(fp, "%c\n", foo); undefined.
But in practice, there's no reason for an implementation to play that kind of game, and you can rely on fprintf(fp, "%c\n", foo); to work as expected. I've never seen or heard of an implementation where it wouldn't work.
Personally, I prefer not to rely on that. If I were writing that code, I'd add an explicit conversion, via a cast, just so these questions don't arise in the first place:
unsigned int foo = 42;
fprintf(fp, "%c\n", (int)foo);
Or I'd make foo an int in the first place.

Resources