Undefined behavior of " %*d " in C programing language - c

case 1:
printf("%*d", 10, 5)
output :
_________5
(I am using _ to denote blank spaces )
case 2:
printf("%*d", 10.4, 5)
expected output:
______0005
But goes for infinite loop
why is this behavior being showed by %*d for decimal "field width precision prefix" ?

You told printf using the * format to expect a width integer in the argument list. You gave it a floating-point argument, 10.4. C gets confused when it expects an integer and a float is there instead. You likely intended this:
printf("%*.*d", 10, 4, 5);
with width and precision being represented each by its own separate integral argument.

The second line, printf("%*d",10.4,5) leads to undefined behavior. The function printf expects an int but is given the double 10.4 instead.
The handling of the variable number of parameters of printf, the format string and the corresponding types is complex, and goes deep into compiler construction. It's difficult to trace what happens exactly and can vary from compiler to compiler.
See here for the exact specifications of the printf format specifiers.
Addition:
Let's see if we can trace what exactly happens in GNU's implementation of the standard library. First, looking at the source for printf. It uses varargs and a function called __vfprintf_internal. The latter is defined here (line 1318). On line 1363, a sanity check is performed using a macro, but it only checks if the format string pointer is not NULL. Line 1443:
int prec = -1; /* Precision of output; -1 means none specified. */
Line 1582, specifying the argument as an int in case the * modifier was used:
prec = va_arg (ap, int);
From here on, the precision is processed as an int. Let's look at the implementation of va_arg to see what happens if it is given a double. It is part of the stdarg header of GCC, see here, line 49:
#define va_arg(v,l) __builtin_va_arg(v,l)
And now it gets complex. __builtin_va_arg isn't explicitly defined anyway, but is part of GCC's internal representation of the C language. See the parser file. That means that we cannot read concrete types in the source files anymore.
We can obtain some more information on the processing of varargs in the builtins.c file. From now on I can only guess what happens. The processing appears to start in expand_builtin_va_start which takes a tree parameter and returns an rtx (RTL Expression) object. This object is a constant and probably has the double type mode. I'm assuming the compiler processes double expression until it knows what (machine specific) bit values it has to write in the executable. Since, evidently, the floating point number is not truncated to an int, I wouldn't be surprised if the value would actually correspond, and later will be interpreted as, a more or less random value (e.g. 77975616). It may be also conceivable that the memory of the program would misaligned when, e.g., the type of a double (usually 8 bytes) is larger than an int (usually 4 bytes). More on the implementation of varargs here.
Whatever sort-of random value the integer could take would be processed back in the process_arg(fspec) back in vfprintf-internal.c.
Additional curiosity:
If printf is given a float as specifier, even if it is explicitly cast, GCC will still give a warning that the value is a double:
warning: field width specifier ‘*’ expects argument of type ‘int’, but argument 2 has type ‘double’ [-Wformat=]
10 | printf("%*d\n", (float) 12.3F, 5);
| ~^~ ~~~~~~~~~~~~~
| | |
| int double

When you pass 10.4 into printf, it’s treated as a double. However, when printf sees the * character, it tries to interpret the argument containing the number of characters to print as an int. Despite the fact that you and I can intuitively see how to treat 10.4 as an integer by rounding down to 10, that’s not how C sees things. This results in what's called undefined behavior. On some platforms, C might treats the bytes of the double 10.4 as an integer, producing an absolutely colossal integer rather than the expected 10. On other platforms, it might read other data expecting to find an int argument, but instead which holds some other unexpected value. In either case, the result is unlikely to be the nice "interpret the value as 10" that you expect it to be.

Use -Wall -Wextra to see more warnings. You will discover:
<source>:30:12: warning: field width specifier '*' expects argument of type 'int', but argument 2 has type 'double' [-Wformat=]
30 | printf("%*d", 10.4, 5);
| ~^~ ~~~~
| | |
| int double
It is undefined behaviour.
Now an example what is happening in the printf function:
int foo(const char *fmt, ...)
{
va_list va;
int retval = 0;
va_start(va, fmt);
retval = va_arg(va, int);
return retval;
}
unsigned bar(const char *fmt, ...)
{
va_list va;
unsigned retval = 0;
va_start(va, fmt);
retval = va_arg(va, unsigned);
return retval;
}
int main(void)
{
printf("as int %d\n", foo("", 10.4));
printf("as uint %u\n", bar("", 10.4));
}
And let's execute it:
https://godbolt.org/z/EsMaM8EPs

Related

Do format specifiers perform implicit type conversion?

#include <stdio.h>
int main(void) {
int x = 5;
int y = &x;
printf("Signed Value of Y: %d \n", y);
printf("Unsigned Value of Y: %u", y);
return 0;
}
Since y is of int type, using %d gives a possibly-signed output, whereas %u gives an unsigned output. But y is of int type, so why does %u give unsigned output? Is there an implicit type conversion?
"Re: But y is of int type, So why does %u give unsigned output?"
From C11:
If a conversion specification is invalid, the behavior is
undefined. If any argument is not the correct type for the
corresponding conversion specification, the behavior is undefined.
where,
undefined — The behavior for something incorrect, on which the
standard does not impose any requirements. Anything is allowed to
happen, from nothing, to a warning message to program termination, to
CPU meltdown, to launching nuclear missiles (assuming you have the
correct hardware option installed).
— Expert C Programming.
Effectively, a printf call is two separate things:
All the arguments are prepared to send to the function.
The function interprets the format string and its other arguments.
In any function call, the arguments are prepared according to rules involving the argument types and the function declaration. They do not depend on the values of the arguments, including the contents of any string passed as an argument, and this is true of printf too.
In a function call, the rules are largely (omitting some details):
If the argument corresponds to a declared parameter type, it is converted to that type.
Otherwise (if the argument corresponds to the ... part of a function declaration or the called function is declared without specifying parameter types), some default promotions are applied. For integers, these are the integer promotions, which largely (omitting some details) convert types narrower than int to int. For floating-point, float is promoted to double.
printf is declared as int printf(const char * restrict format, ...);, so all its arguments other than the format string correspond to ....
Inside printf, the function examines its format string and attempts to perform the directives given in the format string. For example, if a directive is %g, printf expects a double argument and takes bits from the place it expects a double argument to be passed. Then it interprets those bits as a double, constructs a string according to the directive, and writes the string to standard output.
For a %d or %u directive, printf expects an int or an unsigned int argument, respectively. In either case, it takes bits from the place it expects an int or an unsigned int argument to be passed. In all C implementations I am aware of, an int and an unsigned int argument are passed in the same place. So, if you pass an int argument but use %u, printf will get the bits of an int but will treat them as if they were the bits of an unsigned int. No actual conversion has been performed; printf is merely interpreting the bits differently.
The C standard does not define the behavior when you do this, and a C implementation would be conforming to the standard if it crashed when you did this or if it processed the bits differently. You should avoid it.
Is there an implicit type conversion?
Sort of. A function such as printf that accepts a variable number of arguments does not automatically know the number of variable arguments it actually receives on any call, or their types. Conversion specifications such as %d and %u collectively tell printf() how many variable arguments to expect, and individually they tell printf what type to expect for each argument. printf() will try to interpret the argument list according to these specifications.
The C language specification explicitly declines to say what happens when the types of printf arguments do not correspond properly to the conversion specifications in the accompanying format string. In practice, however, some pairs of data types have representations similar enough to each other that printf()'s attempt to interpret data of one type as if it were the other type is likely (but not guaranteed) to give the appearance of an implicit conversion from one type to the other. Corresponding signed and unsigned integer types are typically such pairs.
You should not rely on such apparent conversions actually happening. Instead, properly match argument types with conversion specifications. Correct mismatches by choosing a different conversion specification or performing an appropriate explicit type conversion (a typecast) on the argument.

I get previous float value when I am printing new value

I am getting output 0.23 from second printf. But typecasting gives required output. If I am not using type casting previous value is printed.
Compiler version is GCC 6.3
#include <stdio.h>
int main() {
printf("%f ", 0.23);
printf("%f", 0);
return 0;
}
LINK FOR IDE
in
> printf("%f",0);
You ask to print a double but you give an int, this is contradictory
You are not in the case where the generated code makes a double from the int because printf is not int printf(const char *, double); but int printf ( const char * format, ... ); and the compiler does not look at the format to make the necessary conversions ( but in a lot of cases the compiler warn you )
When prints access to the second argument is does to get a double using 64b and probably your int use only 32b, the behavior is undefined.
(edit, thank you #chqrlie)
I get previous float value when i am printing new value
In your case may be printf retrieves a double value from the MMX registers as opposed to the int value that was passed via the stack or regular registers... which may explain why the same value gets printed twice. But of course as always with undefined behavior, anything else could happen at any time
The problem is a combination of two factors:
The first is that for vararg functions like printf, the compiler will not do any implicit conversions of the arguments. So the 0 in the argument list is an integer constant (of type int).
The second factor is the mismatching format specifier. The printf function doesn't know anything about the arguments being passed, except what is specified in the format string. Mismatching format and argument type leads to undefined behavior. And since the "%f" specifier make printf expect a value of type double, and you have given an int value, you have such a mismatch.

How printf() function knows the type of its arguments

Consider the following program,
#include <stdio.h>
int main()
{
char a = 130;
unsigned char b = 130;
printf("a = %d\nb = %d\n",a,b);
return 0;
}
This program will show the following output.
a = -126
b = 130
My question is how printf() function comes to know the type of a is signed and type of b is unsigned to show result like above?
printf() doesn't know the types, that's why you have to give a correct format string. The prototype for printf() looks like this:
int printf(const char * restrict format, ...);
So, the only argument with a known type is the first one, the format string.
This also means that any argument passed after that is subject to default argument promotion -- strongly simplified, read it as any integer will be converted to at least int -- or ask google about the term to learn each and every detail ;)
In your example, you have implementation defined behavior:
char a = 130;
If your char could represent 130, that's what you would see in the output of printf(). Promoting the value to int doesn't change the value. You're getting a negative number instead, which means 130 overflowed your char. The result of overflowing a signed integer type during conversion in C is implementation defined, the value you're getting probably means that on you machine, char has 8 bits (so the maximum value is 127) and the signed integer overflow resulted in a wraparound to the negative value range. You can't rely on that behavior!
In short, the negative number is created in this line -- 130 is of type int, assigning it to char converts it and this conversion overflows.
Once your char has the value -126, passing it to printf() just converts it to int, not changing the value.
The additional arguments to printf() are formatted according to the type specifier. See here for a list of C format specifiers.
https://fr.cppreference.com/w/c/io/fprintf
It's true that one would not expect b to be printed as 130 in your example since you used the %d specifier and not %u. This surprising behavior seems to be explained here.
Format specifier for unsigned char
I hope I got your question well.
Edit: I can not comment Felix Palmen's answer on account on my low reputation. default argument promotion indeed seems to be the key here, but to me the real question here besides the overflow of a is why b is still printed as 130 despite the use of the signed specifier. It can also be explained with default argument promotion but that should be made more precise.
You need to have a look at the definition of printf statement in stdio.h. You already got the answer in comment printf just write the string pointed by format to stdout.
It's variadic function and it use vargas to get all the arguments in variable-length argument list.
You
This is from the glibc from the GNU version.
int __printf (const char *format, ...)
{
va_list arg;
int done;
va_start (arg, format);
done = vfprintf (stdout, format, arg);
va_end (arg);
return done;
}
What vfprintf does?
It just writes the string pointed by format to the stream, replacing any format specifier in the same way as printf does, but using the elements in the variable argument list identified by arg instead of additional function arguments.
More information about the vfprintf
printf() does not know the data type of arguments. It works on format specifier you passed. The data type you are using is char (having range from -128 to +127) and unsigned char (having range from 0 to 255). Your output for a is overflowed after 127. So the output comes to -126.

How is conversion of float/double to int handled in printf?

Consider this program
int main()
{
float f = 11.22;
double d = 44.55;
int i,j;
i = f; //cast float to int
j = d; //cast double to int
printf("i = %d, j = %d, f = %d, d = %d", i,j,f,d);
//This prints the following:
// i = 11, j = 44, f = -536870912, d = 1076261027
return 0;
}
Can someone explain why the casting from double/float to int works correctly in the first case, and does not work when done in printf?
This program was compiled on gcc-4.1.2 on 32-bit linux machine.
EDIT:
Zach's answer seems logical, i.e. use of format specifiers to figure out what to pop off the stack. However then consider this follow up question:
int main()
{
char c = 'd'; // sizeof c is 1, however sizeof character literal
// 'd' is equal to sizeof(int) in ANSI C
printf("lit = %c, lit = %d , c = %c, c = %d", 'd', 'd', c, c);
//this prints: lit = d, lit = 100 , c = d, c = 100
//how does printf here pop off the right number of bytes even when
//the size represented by format specifiers doesn't actually match
//the size of the passed arguments(char(1 byte) & char_literal(4 bytes))
return 0;
}
How does this work?
The printf function uses the format specifiers to figure out what to pop off the stack. So when it sees %d, it pops off 4 bytes and interprets them as an int, which is wrong (the binary representation of (float)3.0 is not the same as (int)3).
You'll need to either use the %f format specifiers or cast the arguments to int. If you're using a new enough version of gcc, then turning on stronger warnings catches this sort of error:
$ gcc -Wall -Werror test.c
cc1: warnings being treated as errors
test.c: In function ‘main’:
test.c:10: error: implicit declaration of function ‘printf’
test.c:10: error: incompatible implicit declaration of built-in function ‘printf’
test.c:10: error: format ‘%d’ expects type ‘int’, but argument 4 has type ‘double’
test.c:10: error: format ‘%d’ expects type ‘int’, but argument 5 has type ‘double’
Response to the edited part of the question:
C's integer promotion rules say that all types smaller than int get promoted to int when passed as a vararg. So in your case, the 'd' is getting promoted to an int, then printf is popping off an int and casting to a char. The best reference I could find for this behavior was this blog entry.
There's no such thing as "casting to int in printf". printf does not do and cannot do any casting. Inconsistent format specifier leads to undefined behavior.
In practice printf simply receives the raw data and reinterprets it as the type implied by the format specifier. If you pass it a double value and specify an int format specifier (like %d), printf will take that double value and blindly reinterpret it an an int. The results will be completely unpredictable (which is why doing this formally causes undefined behavior in C).
Jack's answer explains how to fix your problem. I'm going to explain why you're getting your unexpected results. Your code is equivalent to:
float f = 11.22;
double d = 44.55;
int i,j,k,l;
i = (int) f;
j = (int) d;
k = *(int *) &f; //cast float to int
l = *(int *) &d; //cast double to int
printf("i = %d, j = %d, f = %d, d = %d", i,j,k,l);
The reason is that f and d are passed to printf as values, and then these values are interpreted as ints. This doesn't change the binary value, so the number displayed is the binary representation of a float or a double. The actual cast from float to int is much more complex in the generated assembly.
Because you are not using the float format specifier, try with:
printf("i = %d, j = %d, f = %f, d = %f", i,j,f,d);
Otherwise, if you want 4 ints you have to cast them before passing the argument to printf:
printf("i = %d, j = %d, f = %d, d = %d", i,j,(int)f,(int)d);
The reason your follow-up code works is because the character constant is promoted to an int before it is pushed onto the stack. So printf pops off 4 bytes for %c and for %d. In fact, character constants are of type int, not type char. C is strange that way.
printf uses variable length argument lists, which means you need to provide the type information. You're providing the wrong information, so it gets confused. Jack provides the practical solution.
It's worth noting that printf, being a function with a variable-length argument list, never receives a float; float arguments are "old school" promoted to doubles.
A recent standard draft introduces the "old school" default promotions first (n1570, 6.5.2.2/6):
If the expression that denotes the called function has a type that
does not include a prototype, the integer promotions are performed on
each argument, and arguments that have type float are promoted to
double. These are called the default argument promotions.
Then it discusses variable argument lists (6.5.2.2/7):
The
ellipsis notation in a function prototype declarator causes argument
type conversion to stop after the last declared parameter. The default
argument promotions are performed on trailing arguments.
The consequence for printf is that it is impossible to "print" a genuine float. A float expression is always promoted to double, which is an 8 byte value for IEEE 754 implementations. This promotion occurs on the calling side; printf will already have an 8 byte argument on the stack when its execution starts.
If we assign 11.22to a double and inspect its contents, with my x86_64-pc-cygwin gcc I see the byte sequence 000000e0a3702640.
That explains the int value printed by printf: Ints on this target still have 4 bytes, so that only the first four bytes 000000e0 are evaluated, and again in little endian, i.e. as 0xe0000000. This is -536870912 in decimal.
If we reverse all of the 8 bytes, because the Intel processor stores doubles in little endian, too, we get 402670a3e0000000. We can check the value this byte sequence represents in IEEE format on this web site; it's close to 1.122E1, i.e. 11.22, the expected result.

C : Printing big numbers

Take the following :
#include <stdio.h>
main() {
unsigned long long verybig = 285212672;
printf("Without variable : %llu\n", 285212672);
printf("With variable : %llu", verybig);
}
This is the output of the above program :
Without variable : 18035667472744448
With variable : 285212672
As you can see from the above, when printf is passed the number as a constant, it prints some huge incorrect number, but when the value is first stored in a variable, printf prints the correct number.
What is the reasoning behind this?
Try 285212672ULL; if you write it without suffixes, you'll find the compiler treats it as a regular integer. The reason it's working in a variable is because the integer is being cast up to an unsigned long long in the assignment, so that the value passed to printf() is the right type.
And before you ask, no, the compiler probably isn't smart enough to figure it out from the "%llu" in the printf() format string. That's a different level of abstraction. The compiler is responsible for the language syntax, printf() semantics are not part of the syntax, it's a runtime library function (no different really from your own functions except that it's included in the standard).
Consider the following code for a 32-bit int and 64-bit unsigned long long system:
#include <stdio.h>
int main (void) {
printf ("%llu\n",1,2);
printf ("%llu\n",1ULL,2);
return 0;
}
which outputs:
8589934593
1
In the first case, the two 32-bit integers 1 and 2 are pushed on the stack and printf() interprets that as a single 64-bit ULL value, 2 x 232 + 1. The 2 argument is being inadvertently included in the ULL value.
In the second, you actually push the 64-bit 1-value and a superfluous 32-bit integer 2, which is ignored.
Note that this "getting out of step" between your format string and your actual arguments is a bad idea. Something like:
printf ("%llu %s %d\n", 0, "hello", 0);
is likely to crash because the 32-bit "hello" pointer will be consumed by the %llu and %s will try to de-reference the final 0 argument. The following "picture" illustrates this (let's assume that cells are 32-bits and that the "hello" string is stored at 0xbf000000.
What you pass Stack frames What printf() uses
+------------+
0 | 0 | \
+------------+ > 64-bit value for %llu.
"hello" | 0xbf000000 | /
+------------+
0 | 0 | value for %s (likely core dump here).
+------------+
| ? | value for %d (could be anything).
+------------+
It's worth pointing out that some compilers give a useful warning for this case - for example, this is what GCC says about your code:
x.c: In function ‘main’:
x.c:6: warning: format ‘%llu’ expects type ‘long long unsigned int’, but argument 2 has type ‘int’
285212672 is an int value. printf expects an unsigned long long and you're passing it an int. Consequently, it'll take more bytes off the stack than you passed a real value and prints garbage. When you put it in an unsigned long long variable before passing it to the function, it'll be promoted to unsigned long long in the assignment line and you pass that value to printf which works correctly.
Datatype is simply a way of interpreting contents of a memory location.
in first case the constant value is stored in read only memory location as an int, the printf tries to interpret this address as 8 byte location as it is instructed that the value stored is long long in process of which it prints garbage value.
In the second case printf tries to interpret a long long value as 8 bytes and it prints what is expected.

Resources