Effect of default argument promotions on wchar_t - c

I am a bit confused about how default argument promotions effect wchar_t.
I understand that char is promoted to int, and therefore I have to supply int as the second parameter of va_arg, otherwise I may (GCC) or may not (MSVC) get an error, as demonstrated by the "%c" example below.
So, I thought that - analogically - I should take into account some similar promotion in case of wchar_t, and read the definitions of the relevant types in the C99 standard:
7.17 wchar_t ... is an integer type whose range of values can represent distinct codes for all members of the largest extended
character set specified among the supported locales; the null
character shall have the code value zero. Each member of the basic
character set shall have a code value equal to its value when used as
the lone character in an integer character constant if an
implementation does not define __STDC_MB_MIGHT_NEQ_WC__.
7.24.1 wint_t ... is an integer type unchanged by default argument promotions that can hold any value corresponding to members of the
extended character set, as well as at least one value that does not
correspond to any member of the extended character set (see WEOF
below).
It is clear to me that wint_t is not promoted to anything, and I suspect but do not know for sure that wchar_t is not promoted either.
I tried fetching arguments with va_arg as wchar_t, wint_t and int, and all of these worked, however, this may have happened because of luck:
#include <stdarg.h>
#include <stdio.h>
#include <string.h>
#include <wchar.h>
void print( char const* format, ... );
int main()
{
printf( "char == %zu, int == %zu, wchar_t == %zu, wint_t == %zu.\n",
sizeof( char ), sizeof( int ), sizeof( wchar_t ), sizeof( wint_t ) );
// MSVC x86: char == 1, int == 4, wchar_t == 2, wint_t == 2.
// MSVC x64: char == 1, int == 4, wchar_t == 2, wint_t == 2.
// GCC x64: char == 1, int == 4, wchar_t == 4, wint_t == 4.
char charA = 'A';
print( "%c", charA );
wchar_t wchar_tA = L'A';
print( "%lc", wchar_tA );
printf( "\n" );
}
void print( char const* format, ... )
{
va_list arguments;
va_start( arguments, format );
if( strcmp( format, "%c" ) == 0 ) {
// char c = va_arg( arguments, char ); // -> Bad (1)
char c = va_arg( arguments, int ); // -> Good
putchar( ( int ) c );
} else if( strcmp( format, "%lc" ) == 0 ) {
wchar_t w = va_arg( arguments, wchar_t ); // -> Good
// wint_t w = va_arg( arguments, wint_t ); // -> Good
// int w = va_arg( arguments, int ); // -> Good
putwchar( ( wchar_t ) w );
}
va_end( arguments );
}
// (1) GCC prints:
// warning: 'char' is promoted to 'int' when passed through '...'
// note: (so you should pass 'int' not 'char' to 'va_arg')
// Running the program prints:
// Illegal instruction
The question: Which of the three lines containing va_arg in the else if block is the correct, standard-compliant one?

I thought that - analogically - I should take into account some similar promotion in case of wchar_t,
Yes, the type you specify to the va_arg() macro must be compatible with the type of the corresponding actual argument, as promoted according to the default argument promotions, except that you can interchange signed and unsigned versions of the same type as long as both can represent the actual value, and you can interchange void * and char *.
It is clear to me that wint_t is not promoted to anything,
Yes, inasmuch as I take you to mean by the default argument promotions, the specifications say that explicitly.
and I suspect but do not know for sure that wchar_t is not promoted either.
It is not safe to assume that wchar_t is unchanged by the default argument promotions. If it is neither int nor unsigned int but its integer conversion rank is less than or equal to that of int, then it is affected. Otherwise not. C does not specify which case applies, and that may vary from implementation to implementation.
Note also that although integer conversion rank is related to the size of a representation of the type (narrow is, generally, lower), it is a distinct concept, and in principle, you cannot reliably judge based on size.
Which of the three lines containing va_arg in the else if block is the correct, standard-compliant one?
It depends on your implementation. There is no available alternative that is certain to be correct for every conforming implementation, because the spec does not constrain wchar_t sufficiently for that. But of the three, this one is your best bet:
// int w = va_arg( arguments, int ); // -> Good
That covers you in all remotely likely variations of wchar_t being promoted to a different type via the default argument promotions. It is definitely correct when wchar_t is int. And it's fine if wchar_t is unsigned int, as long as the actual value of the argument does not exceed INT_MAX.
It would be incorrect for an implementation in which wchar_t had greater integer conversion rank than int, but I do not know any implementation with that characteristic, and I don't expect ever to see one.
In contrast, this one is unsafe:
wchar_t w = va_arg( arguments, wchar_t ); // -> Good
It is ok if wchar_t is int or unsigned int, but the previous is also good in that case (except if wchar_t is unsigned int and the argument value exceeds INT_MAX). This would be the only correct alternative for an implementation where wchar_t had greater integer conversion rank than int and was not the same as wint_t, but again, I don't expect ever to see such an implementation. But otherwise, it is wrong when wchar_t has integer conversion rank less than or equal to that of int, and that is a characteristic of some real-world implementations.
And this one is less safe than the int alternative:
// wint_t w = va_arg( arguments, wint_t ); // -> Good
If wint_t happens to be int then it is equivalent to the int variation, of course. But if it happens to be neither int nor unsigned int nor wchar_t, however, then it is definitely wrong, whether wchar_t is affected by the integer promotions or not.

Related

Why type casting not required?

Below are following code written in c using CodeBlocks:
#include <stdio.h>
void main() {
char c;
int i = 65;
c = i;
printf("%d\n", sizeof(c));
printf("%d\n", sizeof(i));
printf("%c", c);
}
Why when printing variable c after it was assigned with int value (c = i), there no need for casting to be made?
A cast is a way to explicitly force a conversion. You only need casts when no implicit conversions take place, or when you wish the result to have another type than what implicit conversion would yield.
In this case, the C standard requires an implicit conversion through the rule for the assignment operator (C11 6.5.16.1/2):
In simple assignment (=), the value of the right operand is converted to the type of the
assignment expression and replaces the value stored in the object designated by the left
operand.
char and int are both integer types. Which in turn means that in this case, the rules for converting integers are implicitly invoked:
6.3.1.3 Signed and unsigned integers
When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.
Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
In your case, the new type char can be either signed or unsigned depending on compiler. The value 65 can be represented by char regardless of signedness, so the first paragraph above applies and the value remains unchanged. For other larger values, you might have ended up with the "value cannot be represented" case instead.
This is a valid conversion between integer types, so no cast is necessary.
Please note that strictly speaking, the result of sizeof(c) etc is type size_t and to print that one correctly with printf, you must use the %zu specifier.
This assignment is performed on compatible types, because char is not much more than a single byte integer, whereas int is usually 4bytes integer type (machine dependant). Still - this (implicit) conversion does not require casting, but You may loose some information in it (higher bytes would get truncated).
Let's examine your program:
char c; int i = 65; c = i; There is no need for a cast in this assignment because the integer type int of variable i is implicitly converted to the integer type of the destination. The value 65 can be represented by type char, so this assignment is fully defined.
printf("%d\n", sizeof(c)); the conversion specifier %d expects an int value. sizeof(c) has value 1 by definition, but with type size_t, which may be larger than int. You must use a cast here: printf("%d\n", (int)sizeof(c)); or possibly use a C99 specific conversion specifier: printf("%zu\n", sizeof(c));
printf("%d\n", sizeof(i)); Same remark as above. The output will be implementation defined but most current systems have 32-bit int and 8-bit bytes, so a size of 4.
printf("%c", c); Passing a char as a variable argument to printf first causes the char value to be promoted to int (or unsigned int) and passed as such. The promotion to unsigned int happens on extremely rare platforms where char is unsigned and has the same size as int. On other platforms, the cast is not needed as %c expects an int argument.
Note also that main must be defined with a return type int.
Here is a modified version:
#include <stdio.h>
#include <limits.h>
int main() {
char c;
int i = 65;
c = i;
printf("%d\n", (int)sizeof(c));
printf("%d\n", (int)sizeof(i));
#if CHAR_MAX == UINT_MAX
/* cast only needed on pathological platforms */
printf("%c\n", (int)c);
#else
printf("%c\n", c);
#endif
return 0;
}

What is a fully promoted type?

I came across this in va_copy(3):
/* need a cast here since va_arg only
* takes fully promoted types */
c = (char) va_arg(ap, int);
What is a fully promoted type?
This is referring to the rules of integer promotion. Anytime an integer value with a type smaller than int (i.e. char, short) is used in a context where an int can be used, the value is promoted to an int.
In the case of a variadic function, the type of the arguments to a function are not known at compile time, so this promotion applies.
For example, suppose you had the following functions:
void f1(char c);
void f2(int count, ...);
They are called like this:
char x = 1;
f1(x); // x is passed as char
f2(1, x); // x is passed as int
This behavior is documented in section 6.3.1.1p2 of the C standard:
The following may be used in an expression wherever an int or unsigned
int may be used:
An object or expression with an integer type (other than int or unsigned int ) whose integer conversion rank is less than
or equal to the rank of int and unsigned int .
A bit-field of type
_Bool , int , signed int ,or unsigned int .
If an int can represent all values of the original type (as restricted by the width,
for a bit-field), the value is converted to an int ; otherwise,
it is converted to an unsigned int . These are called the
integer promotions . All other types are unchanged by the
integer promotions.

How to use "zd" specifier with `printf()`?

Looking for clarification on using "zd" with printf().
Certainly the following is correct with C99 and later.
void print_size(size_t sz) {
printf("%zu\n", sz);
}
The C spec seems to allow printf("%zd\n", sz) depending on how it is read:
7.21.6.1 The fprintf function
z Specifies that a following d, i, o, u, x, or X conversion specifier applies to a size_t or the corresponding signed integer type argument; or that a following n conversion specifier applies to a pointer to a signed integer type corresponding to size_t argument. C11dr ยง7.21.6.1 7
Should this be read as
"z Specifies that a following d ... conversion specifier applies to a size_t or the corresponding signed integer type argument ... "(both types) and "z Specifies that a following u ... conversion specifier applies to a size_t or the corresponding signed integer type argument ..." (both types)
OR
"z Specifies that a following d ... conversion specifier applies to a corresponding signed integer type argument ..." (signed type only) and "z Specifies that a following u ... conversion specifier applies to a size_t" (unsigned type only).
I've been using the #2 definition, but now not so sure.
Which is correct, 1, 2, or something else?
If #2 is correct, what is an example of a type that can use "%zd"?
printf with a "%zd" format expects an argument of the signed type that corresponds to the unsigned type size_t.
Standard C doesn't provide a name for this type or a good way to determine what it is. If size_t is a typedef for unsigned long, for example, then "%zd" expects an argument of type long, but that's not a portable assumption.
The standard requires that corresponding signed and unsigned types use the same representation for the non-negative values that are representable in both types. A footnote says that this is meant to imply that they're interchangeable as function arguments. So this:
size_t s = 42;
printf("s = %zd\n", s);
should work, and should print "42". It will interpret the value 42, of the unsigned type size_t, as if it were of the corresponding signed type. But there's really no good reason to do that, since "%zu" is also correct and well defined, without resorting to additional language rules. And "%zu" works for all values of type size_t, including those outside the range of the corresponding signed type.
Finally, POSIX defines a type ssize_t in the headers <unistd.h> and <sys/types.h>. Though POSIX doesn't explicitly say so, it's likely that ssize_t will be the signed type corresponding to size_t.
So if you're writing POSIX-specific code, "%zd" is (probably) the correct format for printing values of type ssize_t.
UPDATE: POSIX explicitly says that ssize_t isn't necessarily the signed version of size_t, so it's unwise to write code that assumes that it is:
ssize_t
This is intended to be a signed analog of size_t. The wording is such
that an implementation may either choose to use a longer type or
simply to use the signed version of the type that underlies size_t.
All functions that return ssize_t (read() and write()) describe as
"implementation-defined" the result of an input exceeding {SSIZE_MAX}.
It is recognized that some implementations might have ints that are
smaller than size_t. A conforming application would be constrained not
to perform I/O in pieces larger than {SSIZE_MAX}, but a conforming
application using extensions would be able to use the full range if
the implementation provided an extended range, while still having a
single type-compatible interface. The symbols size_t and ssize_t are
also required in <unistd.h> to minimize the changes needed for calls
to read() and write(). Implementors are reminded that it must be
possible to include both <sys/types.h> and <unistd.h> in the same
program (in either order) without error.
According to the little test I have done, "zd" is always true ,but "zu" don't work for negative numbers.
Test Code:
#include <stdio.h>
int main (void)
{ int i;
size_t uzs = 1;
ssize_t zs = -1;
for ( i= 0; i<5 ;i++, uzs <<= 16,zs <<= 16 )
{
printf ("%zu\t", uzs); /*true*/
printf ("%zd\t", uzs); /*true*/
printf ("%zu\t", zs); /* false*/
printf ("%zd\n", zs); /*true*/
}
return 0;
}

Using 'char' variables in bit operations

I use XLookupString that map a key event to ASCII string, keysym, and ComposeStatus.
int XLookupString(event_structure, buffer_return, bytes_buffer, keysym_return, status_in_out)
XKeyEvent *event_structure;
char *buffer_return; /* Returns the resulting string (not NULL-terminated). Returned value of the function is the length of the string. */
int bytes_buffer;
KeySym *keysym_return;
XComposeStatus *status_in_out;
Here is my code:
char mykey_string;
int arg = 0;
------------------------------------------------------------
case KeyPress:
XLookupString( &event.xkey, &mykey_string, 1, 0, 0 );
arg |= mykey_string;
But using 'char' variables in bit operations, sign extension can generate unexpected results.
I is possible to prevent this?
Thanks
char can be either signed or unsigned so if you need unsigned char you should specify it explicitly, it makes it clear to those reading you code your intention as opposed to relying on compiler settings.
The relevant portion of the c99 draft standard is from 6.2.5 Types paragraph 15:
The three types char, signed char, and unsigned char are collectively called
the character types. The implementation shall define char to have the same range,
representation, and behavior as either signed char or unsigned char

Is the %c fprintf specifier required to take an int argument

In section 7.19.6.1 paragraph 8 of the C99 standard:
c If no l length modifier is present, the int argument is converted to an
unsigned char, and the resulting character is written.
In section 7.19.6.1 paragraph 9 of the C99 standard:
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined.
Does the fprintf function require an int argument?
For example, would passing an unsigned int result in undefined behavior:
unsigned int foo = 42;
fprintf(fp, "%c\n", foo); /* undefined behavior? */
This worries me since an implementation could have defined char as having the same behavior as unsigned char (section 6.2.5 paragraph 15).
For these cases integer promotion may dictate that the char to be promoted to unsigned int on some implementations. Thus leaving the following code to risk undefined behavior on those implementations:
char bar = 'B';
fprintf(fp, "%c\n", bar); /* possible undefined behavior? */
Are int variables and literal int constants the only safe way to pass a value to fprintf with the %c specifier?
%c conversion specification for fprintf requires an int argument. The value has to be of type int after the default argument promotions.
unsigned int foo = 42;
fprintf(fp, "%c\n", foo);
undefined behavior: foo has to be an int.
char bar = 'B';
fprintf(fp, "%c\n", bar);
not undefined behavior:bar is promoted (default argument promotions) to int as fprintf is a variadic function.
EDIT: to be fair, there are still some very rare implementations where it can be undefined behavior. For example, if char is an unsigned type with not all char values representable in an int (like in this implementation), the default argument promotion is done to unsigned int.
Yes, printf with "%c" requires an int argument -- more or less.
If the argument is of a type narrower than int, then it will be promoted. In most cases, the promotion is to int, with well defined behavior. In the very rare case that plain char is unsigned and sizeof (int) == 1 (which implies CHAR_BIT >= 16), a char argument is promoted to unsigned int, which can cause undefined behavior.
A character constant is already of type int, so printf("%c", 'x') is well defined even on exotic systems. (Off-topic: In C++, character constants are of type char.)
This:
unsigned int foo = 42;
fprintf(fp, "%c\n", foo);
strictly speaking has undefined behavior. N1570 7.1.4p1 says:
If an argument to a function has ... a type (after promotion) not
expected by a function with variable number of arguments, the behavior
is undefined.
and the fprintf call clearly runs afoul of that. (Thanks to ouah for pointing that out.)
On the other hand, 6.2.5p6 says:
For each of the signed integer types, there is a corresponding (but
different) unsigned integer type (designated with the keyword
unsigned) that uses the same amount of storage (including sign information) and has the same alignment requirements.
and 6.2.5p9 says:
The range of nonnegative values of a signed integer type is a subrange
of the corresponding unsigned integer type, and the representation of
the same value in each type is the same.
with a footnote:
The same representation and alignment requirements are meant to imply
interchangeability as arguments to functions, return values from
functions, and members of unions.
The footnote says that function arguments of types int and unsigned int are interchangeable, as long as the value is within the representable range of both types. (For a typical 32-bit system, that means the value has to be in the range 0 to 231-1; int values from -231 to -1, and unsigned int values from 231 to 232-1, are outside the range of the other type, and are not interchangeable.)
But footnotes in the C standard are non-normative. They are generally intended to clarify requirements stated in the normative text, not to impose new requirements. But the normative text here merely states that corresponding signed and unsigned types have the same representation, which doesn't necessarily imply that they're passed the same way as function arguments. In principle, a compiler could ignore that footnote and, for example, pass int and unsigned int arguments in different registers, making fprintf(fp, "%c\n", foo); undefined.
But in practice, there's no reason for an implementation to play that kind of game, and you can rely on fprintf(fp, "%c\n", foo); to work as expected. I've never seen or heard of an implementation where it wouldn't work.
Personally, I prefer not to rely on that. If I were writing that code, I'd add an explicit conversion, via a cast, just so these questions don't arise in the first place:
unsigned int foo = 42;
fprintf(fp, "%c\n", (int)foo);
Or I'd make foo an int in the first place.

Resources