Why does printf's hh and h length modifiers exist?

Why does printf's hh and h length modifiers exist? - c

In variadic functions, default argument promotions occur.
6.5.2.2.6 If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. [...]
6.5.2.2.7 [...] The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments.
Therefore,
signed char c = 123;
int i = 123;
float f = 123;
double d = 123;
printf("%d\n", i); // ok
printf("%d\n", c); // ok, even though %d expects int.
printf("%f\n", d); // ok
printf("%f\n", f); // ok, even though %f expects double.
So why is there a printf length modifier for char (hh) and short (h)?
Section number refer to N2176.

Consider this example:
#include <stdio.h>
int main(void)
{
unsigned short x = 32770;
printf("%d\n", x) ; // (1)
printf("%u\n", x) ; // (2)
}
On a typical 16-bit implementation, the default argument promotions take unsigned short to unsigned int, whereas on a typical 32-bit implementation, unsigned short becomes int.
So on the 16-bit system (1) is UB and (2) is correct, but on the 32-bit system, (1) is correct and (2) it can be debated whether correct or UB.
Using %hu for printing x works on all systems and you don't have to think about these issues.
A similar example could be constructed for char on systems with sizeof(int) == 1.

It's for backwards compatibility.
In a draft version of the C89 standard, printing a signed int, short or char with the %xformat specifier is not undefined behavior:
d, i, o, u, x, X The int argument is converted to signed decimal ( d or i ), unsigned octal ( o ), unsigned decimal ( u ), or unsigned hexadecimal notation ( x or X ); the letters abcdef are used for x conversion and the letters ABCDEF for X conversion. The precision specifies the minimum number of digits to appear; if the value being converted can be represented in fewer digits, it will be expanded with leading zeros. The default precision is 1. The result of converting a zero value with an explicit precision of zero is no characters.
This seems to document that pre-standardized C, using format specifiers such as %x for signed values was an existing practice, thus a preexisting code base using h and hh length modifiers likely existed.
Without the h and hh length modifiers, signed char values with a bit pattern 0xFF would be printed on a 32-bit-int system as 0xFFFFFFFF if printed with a simple %X format specifier.

Regarding the hh specifier specifically, it was explicitly added in C99 in order to utilize printing of all the default fixed size types from stdint.h/inttypes.h. C99 mades the types int_leastn_t from 8 to 64, mandatory, so there was a need for corresponding format specifiers.
From the C99 rationale 5.10, §7.19.6.1 (fprintf):
The %hh and %ll length modifiers were added in C99 (see §7.19.6.2).
§7.19.6.2 (fscanf):
A new feature of C99: The hh and ll length modifiers were added in C99. ll supports the new long long int type. hh adds the ability to treat character types the same as all other integer types; this can be useful in implementing macros such as SCNd8 in <inttypes.h> (see 7.18).
Before C99, there was just d, h and l for printing integer types. In C99, a conventional implementation could for example define inttypes.h specifiers as:
#define SCNi8 hh
#define SCNi16 h
#define SCNi32 d
#define SCNi64 ll
And now the default argument promotions becomes the head ache of the printf/scanf implementation, rather than the inttypes.h implementation.

They are not there for printf() usage, but for scanf() to be able to use references to short integers and char integers. For uniformity and completeness, they are accepted for printf() functions, but they are undistinguisable, as the vaarg parameters of printf() are promoted to int for all parameters that are of types short and char integer values. So they are equivalent in printf() but not in scanf() and friends.

Related

Why I have to use %u with unsigned int and I can use %i with unsigned char?

I tried to play with data types in C. My first problem was printf() show negative value with unsigned int. I fixed this with %u instead of %i.
But unsigned char still works with %i, how it is possible?
#include <stdio.h>
int main(void) {
unsigned int a;
unsigned char b;
a = -7;
b = -1
printf("a=%u\nb=%i\n", a, b);
return 0;
}

If you see e.g. this printf (and family) reference you will see that the "%i" format do
converts a signed integer into decimal representation [-]dddd.
[Emphasis not mine]
Since you pass an unsigned int you're having mismatched format specifier and value, which leads to undefined behavior.
Furthermore, for variable argument functions (like printf) arguments smaller than int (like for example char, signed or unsigned) are promoted to int. And again, since the resulting value is an int (which is signed) and you use the "%u" format, there is a mismatch between format specifier and argument type.

As it has been stated before, using 'i' as the format specifier is not correct for a variable of type unsigned char.
Whenever you are unsure of what the correct format would be for any particular (integer) type, you can just take a look at inttypes.h, which contains a bunch of macros meant to be used for portable format strings. Depending on the platform you're developing for, the correct format specifiers might differ (uint16_t could be 'u' or 'hu', int32_t could be 'd' or 'ld' for instance).
You could either use this header as a "cheat sheet", or actually write your format strings like this:
printf("a=%"PRIu32"\nb=%"PRIu8"\n", a, b);
Note that for the code to actually be portable, you would of course also need to use uint8_t instead of unsigned char, and uint32_t instead of int.

Variadic functions and constants

How exactly do variadic functions treat numeric constants? e.g. consider the following code:
myfunc(5, 0, 1, 2, 3, 4);
The function looks like this:
void myfunc(int count, ...)
{
}
Now, in order to iterate over the single arguments with va_arg, I need to know their sizes, e.g. int, short, char, float, etc. But what size should I assume for numeric constants like I use in the code above?
Tests have shown that just assuming int for them seems to work fine so the compiler seems to push them as int even though these constants could also be represented in a single char or short each.
Nevertheless, I'm looking for an explanation for the behaviour I see. What is the standard type in C for passing numeric constants to variadic functions? Is this clearly defined or is it compiler-dependent? Is there a difference between 32-bit and 64-bit architecture?
Thanks!

I like Jonathan Leffler's answer, but I thought I'd pipe up with some technical details, for those who intend to write a portable library or something providing an API with variadic functions, and thus need to delve in to the details.
Variadic parameters are subject to default argument promotions (C11 draft N1570 as PDF; section 6.5.2.2 Function calls, paragraph 6):
.. the integer promotions are performed on each argument, and arguments that
have type float are promoted to double. These are called the default argument promotions.
[If] .. the types of the arguments after promotion are not compatible with those of the parameters after promotion, the behavior is undefined, except for the following cases:
one promoted type is a signed integer type, the other promoted type is the corresponding unsigned integer type, and the value is representable in both types;
both types are pointers to qualified or unqualified versions of a character type or void
Floating-point constants are of type double, unless they are suffixed with f or F (as in 1.0f), in which case they are of type float.
In C99 and C11, integer constants are of type int if they fit in one; long (AKA long int) if they fit in one otherwise; of long long (AKA long long int) otherwise. Since many compilers assume an integer constant without a size suffix is a human error or typo, it is a good practice to always include the suffix if the integer constant is not of type int.
Integer constants can also have a letter suffix to denote their type:
u or U for unsigned int
l or L for long int
lu or ul or LU or UL or lU or Lu or uL or Ul for unsigned long int
ll or LL or Ll or lL for long long int
llu or LLU (or ULL or any of their uppercase or lowercase variants) for unsigned long long int
The integer promotion rules are in section 6.3.1.1.
To summarize the default argument promotion rules for C11 (there are some additions compared to C89 and C99, but no significant changes):
float are promoted to double
All integer types whose values can be represented by an int are promoted to int. (This includes both unsigned and signed char and short, and bit-fields of types _Bool, int, and smaller unsigned int bit-fields.)
All integer types whose values can be represented by an unsigned int (but not an int) are promoted to unsigned int. (This includes unsigned int bit fields that cannot be represented by an int (of CHAR_BIT * sizeof (unsigned int) bits, in other words), and typedef'd aliases of unsigned int, but that's it, I think.)
Integer types at least as large as int are unchanged. This includes types long/long int, long long/long long int, and size_t, for example.
There is one 'gotcha' in the rules that I'd like to point out: "signed to unsigned is okay, unsigned to signed is iffy":
If the argument is promoted to a signed integer type, but the function obtains the value using the corresponding unsigned integer type, the function obtains the correct value using modulo arithmetic.
That is, negative values will be as if they were incremented by (1 + maximum representable value in the unsigned integer type), making them positive.
If the argument is promoted to an unsigned integer type, but the function obtains the value using the corresponding signed integer type, and the value is representable in both, the function obtains the correct value. If the value is not representable in both, the behaviour is implementation-defined.
In practice, almost all architectures do the opposite of above, i.e. the signed integer value obtained matches the unsigned value substracted by (1 + the largest representable value of the unsigned integer type). I've heard that some strange ones may signal integer overflow or something similarly weird, but I have never gotten my mitts on such machines.
The man 3 printf man page (courtesy of the Linux man pages project) is quite informative, if you compare the above rules to printf specifiers. The make_message() example function at the end (C99, C11, or POSIX required for vsnprintf()) should also be interesting.

When you write 1, that is an int constant. There is no other type that the compiler is allowed to use. If there is a non-variadic prototype for the function that demands a different type, the compiler will convert the integer 1 to the appropriate type, but on its own, 1 is an int constant. So, in your example, all 6 arguments are int.
You have to know the types of the arguments somehow before the called variadic function processes them. With the printf() family of functions, the format string tells it what to expect; similarly with the scanf() family of functions.
Note that the default conversions apply to the arguments corresponding to the ellipsis of a variadic function. For example, given:
char c = '\007';
short s = 0xB0hD;
float f = 3.1415927;
a call to:
int variadic_function(const char *, ...);
using:
int rc = variadic_function("c s f", c, s, f);
actually converts both c and s to int, and f to double.

Why do the conversion specifiers, %o and %x, work differently for printf() and scanf() in C?

I am learning C from the book "C Primer Plus" by Stephen Prata. In chapter 4, the author states that in printf(), %o and %x, denote unsigned octal integers and unsigned hexadecimal integers respectively, but in scanf(), %o and %x, interpret signed octal integers and signed hexadecimal integers respectively. Why is it so?
I wrote the following program in VS 2015 to check the author's statement:
#include <stdio.h>
int main(void)
#pragma warning(disable : 4996)
{
int a, b, c;
printf("Enter number: ");
scanf("%x %x", &a, &b);
c = a + b;
printf("Answer = %x\n", c);
while (getchar() != EOF)
getchar();
return 0;
}
The code proved the author's claim.
If the inputs had a pair integers where the absolute value of the positive integer was bigger than the absolute value of the negative integer, then everything worked fine.
But if the inputs had a pair integers where the absolute value of the positive integer was smaller than the absolute value of the negative integer, then the output was what you would expect from unsigned 2's complement.
For example:
Enter number: -5 6
Answer = 1
and
Enter number: -6 5
Answer = ffffffff

The C standard says that for printf-like functions (7.21.6.1 fprintf):
o,u,x,X
The unsigned int argument is converted to unsigned octal
(o), unsigned decimal (u), or unsigned hexadecimal notation (x or X)
While for scanf-like functions it says (7.21.6.2 fscanf):
x
Matches an optionally signed hexadecimal integer, whose format is the same
as expected for the subject sequence of the strtoul function with the value
16 for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
So as an extra feature, you can write a negative hex number and scanf will convert it to the corresponding unsigned number in the system's format (two's complement).
For example
unsigned int x;
scanf("%x", &x); // enter -1
printf("%x", x); // will print ffffffff
Why they felt like scanf needed this mildly useful feature, I have no idea. Perhaps it is there for consistency with other conversion specifiers.
However, the book seems to be using the function incorrectly, since the standard explicitly states that you must pass a pointer to unsigned int. If you pass a pointer to a signed int, you are formally invoking undefined behavior.

Reading the C11 specification, section 7.21.6.2/12, it says for the o format:
Matches an optionally signed octal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 8
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
With corresponding text for the hexadecimal x format.
So on one hand the specification says the input can be signed, but it also says the format is the same as for the strtoul function which reads unsigned integers, and the result is stored in an unsigned integer.

Indeed the author is wrong as #Joachim Pileborg pointed out
This is what the standard says about it
7.21.6.2 The fscanf function1
12 The conversion specifiers and their meanings are:
o Matches an optionally signed octal integer, whose format is the same as
expected for the subject sequence of the strtoul function with the value 8
for the base argument. The corresponding argument shall be a pointer to
unsigned integer.
x Matches an optionally signed hexadecimal integer, whose format is the same as expected for the subject sequence of the strtoul function with the value 16 for the base argument. The corresponding argument shall be
a pointer to unsigned integer.
as you can read above it's optionally signed but it certainly expects a pointer to and unsigned integer
1Of course I have omitted a lot, in fact fscanf() is one of the largest sections in the standard document.

printf format for 1 byte signed number

Assuming the following:
sizeof(char) = 1
sizeof(short) = 2
sizeof(int) = 4
sizeof(long) = 8
The printf format for a 2 byte signed number is %hd, for a 4 byte signed number is %d, for an 8 byte signed number is %ld, but what is the correct format for a 1 byte signed number?

what is the correct format for a 1 byte signed number?
%hh and the integer conversion specifier of your choice (for example, %02hhX. See the C11 standard, §7.21.6.1p5:
hh
Specifies that a following d, i, o, u, x, or X conversion specifier applies to a signed char or unsigned char argument (the argument will have been promoted according to the integer promotions, but its value shall be converted to signed char or unsigned char before printing);…
The parenthesized comment is important. Because of integer promotions on the arguments to variadic functions (such as printf), the function never sees a char argument. Many programmers think that that means that it is unnecessary to use h and hh qualifiers. Certainly, you are not creating undefined behaviour by leaving them out, and most of the time it will work.
However, char may well be signed, and the integer promotion will preserve its value, which will make it into a signed integer. Printing the signed integer out with an unsigned format (such as %02X) will present you with the sign-extended Fs. So if you want to display signed char using an unsigned format, you need to tell printf what the original unpromoted width of the integer type was, using hh.
In case that wasn't clear, a simple example (but controversial) example:
/* Read the comments thread to this post; I'll remove
this note when I edit the outcome of the discussion into
the answer
*/
#include <stdio.h>
int main(void) {
char* s = "\u00d1"; /* Ñ */
for (char* p = s; *p; ++p) printf("%02X (%02hhX)\n", *p, *p);
return 0;
}
Output:
$ ./a.out
FFFFFFC3 (C3)
FFFFFF91 (91)
In the comment thread, there is (or possibly was) considerable discussion about whether the above snippet is undefined behaviour because the X format specification requires an unsigned argument, whereas the char argument is (at least on the implementation which produced the presented output) signed. I think this argument relies on §7.12.6.1/p9: "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined."
However, in the case of char (and short) integer types, the expression in the argument list is promoted to int or unsigned int before the function is called. (It's worth noting that on most architectures, all three character types will be promoted to a signed int; promotion of an unsigned char (or an unsigned char) to an unsigned int will only happen on an implementation where sizeof(int) == 1.)
So on most architectures, the argument to an %hx or an %hhx format conversion will be signed, and that cannot be undefined behaviour without rendering the use of these format codes meaningless.
Furthermore, the standard does not say that fprintf (and friends) will somehow recover the original expression. What it says is that the value "shall be converted to signed char or unsigned char before printing" (§7.21.6.1/p5, quoted above, emphasis added).
Converting a signed value to an unsigned value is not undefined. It is not even unspecified or implementation-dependent. It simply consists of (conceptually) "repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type." (§6.3.1.3/p2)
So there is a well-defined procedure to convert the argument expression to a (possibly signed) int argument, and a well-defined procedure for converting that value to an unsigned char. I therefore argue that a program such as the one presented above is entirely well-defined.
For corroboration, the behaviour of fprintf given a format specifier %c is defined as follows (§7.21.6.8/p8), emphasis added:
the int argument is converted to an unsigned char, and the resulting character is written.
If one were to apply the proposed restrictive interpretation which renders the above program undefined, then I believe that one would be forced to also argue that:
void f(char c) {
printf("This is a '%c'.\n", c);
}
was also UB. Yet, I think almost every C programmer has written something similar to that without thinking twice about it.
The key part of the question is what is meant by "argument" in §7.12.6.1/p9 (and other parts of §7.12.6.1). The C++ standard is slightly more precise; it specifies that if an argument is subject to the default argument promotions, "the value of the argument is converted to the promoted type before the call" which I interpret to mean that when considering the call (for example, the call of fprintf), the arguments are now the promoted values.
I don't think C is actually different, at least in intent. It uses wording like "the arguments&hellips; are promoted", and in at least one place "the argument after promotion". Furthermore, in the description of variadic functions (the va_arg macro, §7.16.1.1), the constraint on the argument type is annotated parenthetically "the type of the actual next argument (as promoted according to the default argument promotions)".
I'll freely agree that all of this is (a) subtle reading of insufficiently precise language, and (b) counting dancing angels. But I don't see any value in declaring that standard usages like the use of %c with char arguments are "technically" UB; that denatures the concept of UB and it is hard to believe that such a prohibition would be intentional, which leads me to believe that the interpretation was not intended. (And, perhaps, should be corrected editorially.)

Tilde operator in C

unsigned short int i = 0;
printf("%u\n",~i);
Why does this code return a 32 bit number in the console? It should be 16 bit, since short is 2 bytes.
The output is 4,294,967,295 and it should be 65,535.

%u expects an unsigned int; if you want to print an unsigned short int, use %hu.
EDIT
Lundin is correct that ~i will be converted to type int before being passed to printf. i is also converted to int by virtue of being passed to a variadic function. However, printf will convert the argument back to unsigned short before printing if the %hu conversion specifier is used:
7.21.6.1 The fprintf function
...
3 The format shall be a multibyte character sequence, beginning and ending in its initial
shift state. The format is composed of zero or more directives: ordinary multibyte
characters (not %), which are copied unchanged to the output stream; and conversion
specifications, each of which results in fetching zero or more subsequent arguments,
converting them, if applicable, according to the corresponding conversion specifier, and
then writing the result to the output stream.
...
7 The length modifiers and their meanings are:
...
h Specifies that a following d, i, o, u, x, or X conversion specifier applies to a
short int or unsigned short int argument (the argument will
have been promoted according to the integer promotions, but its value shall
be converted to short int or unsigned short int before printing);
or that a following n conversion specifier applies to a pointer to a short
int argument.
Emphasis mine.
So, the behavior is not undefined; it would only be undefined if either i or ~i were not integral types.

When you pass an argument to printf and that argument is of integer type shorter than int, it is implicitly promoted to int as per K&R argument promotion rules. Thus your printf-call actually behaves like:
printf("%u\n", (int)~i);
Notice that this is undefined behavior since you told printf that the argument has an unsigned type whereas int is actually a signed type. Convert i to unsigned short and then to unsignedto resolve the undefined behavior and your problem:
printf("%u\n", (unsigned)(unsigned short)~i);

N1570 6.5.3.3 Unary arithmetic operators p4:
The result of the ~ operator is the bitwise complement of its (promoted) operand (that is,
each bit in the result is set if and only if the corresponding bit in the converted operand is
not set). The integer promotions are performed on the operand, and the result has the
promoted type. ...
Integer type smaller than int are promoted to int. If sizeof(unsigned short) == 2 and sizeof(int) == 4, then resulting type is int.
And what's more, printf conversion specifier %u expects unsigned int, so representation of int is interpreted as unsigned int. You are basically lying to compiler, and this is undefined behaviour.

It's because the arguments to printf() are put into the stack in words, as there is no way inside printf to know that the argument is short. Also by using %u format you are merely stating that you are passing an unsigned number.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Why does printf's hh and h length modifiers exist? - c

Related

Why I have to use %u with unsigned int and I can use %i with unsigned char?

Variadic functions and constants

Why do the conversion specifiers, %o and %x, work differently for printf() and scanf() in C?

printf format for 1 byte signed number

Tilde operator in C

Categories

Resources