A strange phenomenon using casting in C - c

the following code:
int main() {
int small_num = 0x12345678;
int largest_num = 0xFFFFFFFF;
printf("small: without casting to short: 0x%.8x, with casting to short: 0x%.8x\n", small_num>>16, (short)(small_num>>16));
printf("large: without casting to short: 0x%.8x, with casting to short: 0x%.8x\n", largest_num>>16, (short)(largest_num>>16));
return 0;
}
gives me the output (using codepad):
small: without casting to short: 0x00001234, with casting to short: 0x00001234
large: without casting to short: 0xffffffff, with casting to short: 0xffffffff
That's seems extremely strange. Anyone have an idea why it happens this way?

When you are casting to (short) in the printf call, then the compiler will cast it from short back to int, which is the parameter which is passed to printf. Therefore, 1234 will be mapped to 1234, and ffff (which is exactly -1) is mapped to ffffffff. Note that negative integers are expanded from short to long by adding "on bits" on their left.

Short answer
The hexadecimal constant has type unsigned int. When converted to signed int the value becomes -1. Right-shifting a negative value usually leaves the sign-bit unchanged, so -1 >> 16 is still -1. A short int passed to a variadic function gets promoted to signed int which, when interpreted as an unsigned int by the %x conversion specifier, prints out 0xffffffff.
Long answer
However, your code is broken for a number of reasons.
Integer conversion
int largest_num = 0xFFFFFFFF;
The type of the hexadecimal constant is the first of the following types in which its value can be represented: int, unsigned int, long int, unsigned long int, long long int, unsigned long long int.
If int has more than 32 bits, you're fine.
If int has 32 bits or less, The result is implementation-defined (or an implementation-defined signal is raised).
Usually, largest_num will have all bits set and have the value -1.
Shifting a negative value
largest_num>>16
If the value of largest_num is negative, the resulting value is implementation-defined. Usually, the sign bit is left unchanged so that -1 right-shifted is still -1.
Integer promotion
printf ("0x%.8x\n", (short)(largest_num>>16));
When you pass a short int to a variadic function, the value will be promoted to int. A negative value will be preserved when converted to the new type.
However, the "%x" conversion specifier expects an unsigned int argument. Because unsigned int and signed int are not compatible types, the behaviour of the code is undefined. Usually, the bits of the signed int is re-interpreted as an unsigned int, which results in the original value of the hexadecimal constant.
Calling a variadic function
printf(...);
printf() is a variadic function. Variadic functions (typically) use different calling conventions than ordinary functions. Your code invokes undefined behaviour if you don't have a valid declaration of print() in scope.
The usual way to provide a declaration for printf() is to #include <stdio.h>.
Source: n1570 (the last public draft of the current C standard).

Related

Type conversion of objects involving unsigned type in C language

I have been reading KnR lately and have come across a statement:
"Type conversions of Expressions involving unsigned type values are
complicated"
So to understand this i have written a very small code:
#include<stdio.h>
int main()
{
signed char x = -1;
printf("Signed : %d\n",x);
printf("Unsigned : %d\n",(unsigned)x);
}
signed x bit form will be : 1000 0001 = -1
when converted to unsigned its value has to be 1000 0001 = 129.
But even after type conversion it prints the -1.
Note: I'm using gcc compiler.
C11 6.3.1.3, paragraph 2:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
−1 cannot be represented as an unsigned int value, the −1 is converted to UINT_MAX. Thus -1 becomes a very large positive integer.
Also, use %u for unsigned int, otherwise result invoked undefined behaviour.
C semantics regarding conversion of integers is defined in terms of the mathematical values, not in terms of the bits. It is good to understand how values are represented as bits, but, when analyzing expressions, you should think about values.
In this statement:
printf("Signed : %d\n",x);
the signed char x is automatically promoted to int. Since x is −1, the new int also has the value −1, and printf prints “-1”.
In this statement:
printf("Unsigned : %d\n",(unsigned)x);
the signed char x is automatically promoted to int. The value is still −1. Then the cast converts this to unsigned. The rule for conversion to unsigned is that UINT_MAX+1 is added to or subtracted from the value as needed to bring it into the range of unsigned. In this case, adding UINT_MAX+1 to −1 once brings the value to UINT_MAX, which is within range. So the result of the conversion is UINT_MAX.
However, this unsigned value is then passed to printf to be printed with the %d conversion. That violates C 2018 7.21.6.1 9, which says the behavior is undefined if the types do not match.
This means a C implementation is allowed to do anything. In your case, it seems what happened is:
The unsigned value UINT_MAX is represented with all one bits.
The printf interpreted the all-one-bits value as an int.
Your implementation uses two’s complement for int types.
In two’s complement, an object with all one bits represents −1.
So printf printed “-1”.
If you had used this correct code instead:
printf("Unsigned : %u\n",(unsigned)x);
then printf would print the value of UINT_MAX, which is likely 4,294,967,295, so printf would print “4294967295”.

C typecasting from a signed char to int type

In the below snippet, shouldn't the output be 1? Why am I getting output as -1 and 4294967295?
What I understand is, the variable, c, here is of signed type, so shouldn't its value be 1?
char c=0xff;
printf("%d %u",c,c);
c is of signed type. a char is 8 bits. So you have an 8 bit signed quantity, with all bits 1. On a twos complement machine, that evaluates to -1.
Some compilers will warn you when you do that sort of thing. If you're using gcc/clang, switch on all the warnings.
Pedant note: On some machines it could have the value 255, should the compiler treat 'char' as unsigned.
You're getting the correct answer.
The %u format specifier indicates that the value will be an unsigned int. The compiler automatically promotes your 8-bit char to a 32-bit int. However you have to remember that char is a signed type. So a value of 0xff is in fact -1.
When the casting from char to int occurs, the value is still -1, but the it's the 32-bit representation which in binary is 11111111 11111111 11111111 11111111 or in hex 0xffffffff
When that is interpreted as an unsigned integer, all of the bits are obviously preserved because the length is the same, but now it's handled as an unsigned quantity.
0xffffffff = 4294967295 (unsigned)
0xffffffff = -1 (signed)
There are three character types in C, char, signed char, and unsigned char. Plain char has the same representation as either signed char or unsigned char; the choice is implementation-defined. It appears that plain char is signed in your implementation.
All three types have the same size, which is probably 8 bits (CHAR_BIT, defined in <limits.h>, specifies the number of bits in a byte). I'll assume 8 bits.
char c=0xff;
Assuming plain char is signed, the value 0xff (255) is outside the range of type char. Since you can't store the value 255 in a char object, the value is implicitly converted. The result of this conversion is implementation-defined, but is very likely to be -1.
Keep this carefully in mind: 0xff is simply another way to write 255, and 0xff and -1 are two distinct values. You cannot store the value 255 in a char object; its value is -1. Integer constants, whether they're decimal, hexadecimal, or octal, specify values, not representations.
If you really want a one-byte object with the value 0xff, define it as an unsigned char, not as a char.
printf("%d %u",c,c);
When a value of an integer type narrower than int is passed to printf (or to any variadic function), it's promoted to int if that type can hold the type's entire range of values, or to unsigned int if it can't. For type char, it's almost certainly promoted to int. So this call is equivalent to:
printf("%d %u", -1, -1);
The output for the "%d" format is obvious. The output for "%u" is less obvious. "%u" tells printf that the corresponding argument is of type unsigned int, but you've passed it a value of type int. What probably happens is that the representation of the int value is treated as if it were of type unsigned int, most likely yielding UINT_MAX, which happens to be 4294967295 on your system. If you really want to do that, you should convert the value to type unsigned int. This:
printf("%d %u", -1, (unsigned int)-1);
is well defined.
Your two lines of code are playing a lot of games with various types, treating values of one type as if they were of another type, and doing implicit conversions that might yield results that are implementation-defined and/or depend on the choices your compiler happens to make.
Whatever you're trying to do, there's undoubtedly a cleaner way to do it (unless you're just trying to see what your implementation does with this particular code).
Let us start with the assumption using OP's "c, here is of signed type"
char c=0xff; // Implementation defined behavior.
0xff is a hexadecimal constant with the value of 255 and type of int.
... the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. §6.3.1.4 3
So right off, the value of c is implementation defined (ID). Let us assume the common ID behavior of 8-bit wrap-around, so c --> -1.
A signed char will be promoted to int as part of a variadic argument to printf("%d %u",c,c); is the same as printf("%d %u",-1, -1);. Printing the -1 with "%d" is not an issue and "-1" is printed.
Printing an int -1 with "%x" is undefined behavior (UB) as it is a mis-matched specifier/type and does not fall under the exception of being representable in both types. The common UB is to print the value as if it was converted to unsigned before being passed. When UINT_MAX == 4294967295 (4-bytes) that prints the value as -1 + (UINT_MAX + 1) or "4294967295"`.
So with ID and UB, you get a result, but robust code would be re-written to depend on neither.

Let int8_t num = -1. Why does printf("%u", num) overflow to T_MAX32 instead of T_MAX8?

int8_t is an 8-bit signed integer. Therefore, its value is anywhere in the range [-128...127].
int8_t num = -1;
printf("%u",num);
Output:
4294967295
Could someone give me a hint?
Your program behaviour is not defined.
%u cannot be used as a format specifier for int8_t since it's a signed type and %u is for unsigned types.
Use %d instead, and rely on the C standard guaranteed automatic promotion of num to an int type.
As others have mentioned, using the incorrect format specifier for printf is undefined behavior. The behavior you experienced cannot be depended on to be consistent between different compilers or even different builds of the same compiler.
That being said, here's is what most likely happened.
Any argument to printf after the first is of an unspecified type. So when num is passed to it, it can't do an exact type check. What ends up happening is that the value of num is promoted to type int.
From section 6.3.1.1 of the C standard:
2 The following may be used in an expression wherever an int or unsigned int may be used:
— An object or expression with an integer type
(other than int or unsigned int) whose integer conversion rank is
less than or equal to the rank of int and unsigned int.
— A bit-field of type _Bool, int, signed int, or unsigned int.
If an int
can represent all values of the original type (as restricted
by the width, for a bit-field), the value is converted to an
int; otherwise, it is converted to an unsigned int. These are
called the integer promotions. All other types are unchanged by
the integer promotions.
Because num was being used in a context where an int could be used, its value in the function call was promoted to int.
Assuming a 32-bit int and 2's compliment representation of negative numbers, the orginal binary representation 11111111 is converted to 11111111 11111111 11111111 11111111. If printed with the %u format specifier, it assumes this representation is unsigned so it prints 4294967295.
Had you used the %d format specifier, which expects a signed value, it would have printed -1.
To reiterate however, what you are seeing is undefined behavior. Other machines / compilers / optimization settings might yield different results.
To print an int8_t, the C standard provids the format specifier macro PRIi8 from <inttypes.h>.
printf("%" PRIi8, num);
Although, %d is fine to use to print an int8_t due to C's default argument promotions, you can simply use it for all signed types. You can see the format specifiers for other fixed width types in POSIX documentation as well.

Sign extension query in case of short

Given,
unsigned short y = 0xFFFF;
When I print
printf("%x", y);
I get : 0xFFFF;
But when I print
printf("%x", (signed short)y);
I get : 0xFFFFFFFF
Whole program below:
#include <stdio.h>
int main() {
unsigned short y = 0xFFFF;
unsigned short z = 0x7FFF;
printf("%x %x\n", y,z);
printf("%x %x", (signed short)y, (signed short)z);
return 0;
}
Sign extension happens when we typecast lower to higher byte data type, but here we are typecasting short to signed short.
In both cases sizeof((signed short)y) or sizeof((signed short)z) prints 2 bytes. Short remains of 2 bytes, if sign bit is zero as in case of 0x7fff.
Any help is very much appreciated!
Output of the first printf is as expected. The second printf produces undefined behavior.
In C language when you pass a a value smaller than int as a variadic argument, that value is always implicitly converted to type int. It is not possible to physically pass a short or char variadic argument. That implicit conversion to int is where your "sign extension" takes place.
For this reason, your printf("%x", y); is equivalent to printf("%x", (int) y);. The value that is passed to printf is 0xFFFF of type int. Technically, %x format requires an unsigned int argument, but a non-negative int value is also OK (unless I'm missing some technicality). The output is 0xFFFF.
Conversion to int happens in the second case as well. I.e. your printf("%x", (signed short) y); is equivalent to printf("%x", (int) (signed short) y);. The conversion of 0xFFFF to (signed short) is implementation-defined, because 0xFFFF is apparently out of range of signed short on your platform. But most likely it produces a negative value (-1). When converted to int it produces the same negative value of type int (again, -1 represented as 0xFFFFFFFF for a 32-bit int). The further behavior is undefined, since you are passing a negative int value for format specifier %x, which requires unsigned int argument. It is illegal to use %x with negative int values.
In other words, formally your second printf prints unpredictable garbage. But practically the above explains where that 0xFFFFFFFF came from.
Let's break it down and into smaller pieces:
Given,
unsigned short y = 0xFFFF;
Assuming two-bytes unsigned short maximum value is 2^16-1, that is indeed 0xFFFF.
When I print
printf("%x", y);
Due to default argument promotion (as printf() is variadic function) value of y is implicitly promoted to type int. With %x format-specified it's treated as unsigned int. Assuming common two-complement's representation and four-bytes int type, that means that as most-significant bit is set to zero, the bit patterns of int and unsigned int are simply the same.
But when I print
printf("%x", (signed short)y);
What you have done is cast to signed type, that cannot represent value of 0xFFFF. Such conversion as standard stays is implementation-defined, so you can get whatever result. After implicit conversion to int apparently you have bit-patern of 32-ones, that are represented as 0xFFFFFFFF.

Adding an signed integer beyond 0xFFFFFFFF

#include <stdio.h>
void fun3(int a, int b, int c)
{
printf("%d \n", a+b+c );
}
void fun2 ( int x, int y)
{
fun3(0x33333333,0x30303030, 0x31313131);
printf("%d \n", x+y);
}
fun1 (int x)
{
fun2(0x22222222,0x20202020);
printf("%d \n", x);
}
main()
{
fun1(0x1111111);
}
I'm going through the above program for stack corruption. I am getting the o/p for the above program with some undesired values. All I could understand is if the added value is beyond 0xFFFFFFFF then the small negative integer becomes the largest value say -1 becomes 0xFFFFFFFF. Any insights on this
EDIT (Corrections) (I missed the point. My answer is right for constants, but the question contains parameters of functions, then what happens here is overflow of signed integer objects and, as correctly pointed out #Cornstalks in his comment, this is undefined behaviour).
/EDIT
In fun1() you are using printf() in a wrong way.
You wrote "%d" to accept an int, but this is not true if your number is greater that MAX_INT.
You have to check the value of MAX_INT in your system.
If you write an integer constant in hexadecimal format, the standard C (ISO C99 or C11) tries to put the value in the first type that the constant can fit, by following this order:
int, unsigned int, long int, unsigned long int, long long int,
unsigned long long int.
Thus, if you have a constant greater that MAX_INT (the max. value in the range of int), your constant (if positive) has type unsigned int, but the directive %d expected a signed int value. Thus, it will be shown some negative number.
Worst, if your constant is a value greater than UMAX_INT (the max. value in the range of unsigned int) then the type of the constant will be the first of long int, unsigned long int, long long int, with precision strictly bigger than of unsigned int.
This implies that %d becomes a wrong directive.
If you cannot be completely sure about how big will be your values, you could do a cast to the biggest integer type:
printf("%lld", (long long int) 0x33333333333);
The directive %lld stands for long long int.
If you are interested always in positive values, you have to use %llu and cast to unsigned long long int:
printf("%llu", (unsigned long long int) 0x33333333333);
In this way, you avoids any "funny" numbers, as much as, you show big numbers without loosing any precision.
Remark: The constants INT_MAX, UINT_MAX, and the like, are in limits.h.
Important: The automatic sequence of casts is only valid for octal and hexadecimal constants. For decimal constants there is another rule:
int, long int, long long int.
To #Cornstalks' point: INT_MIN is 0x80000000, and (int)-1 is 0xFFFFFFFF in 2's complement (on a 32-bit system, anyway).
This allows the instruction set to do things in signed arithmetic like:
1 + -2 = -1
becomes (as signed shorts, for brevity)
0x0001 + 0xFFFE = 0xFFFF
... then:
1 + -1 = 0
is represented internally with overflow as
0x0001 + 0xFFFF = 0x0000
Also to #Cornstalks' point: the internal representation (as well as overflow addition) is an implementation detail. C implementations (and instruction sets) need not represent integers in 2's complement, so providing hex values for signed integer types may tie you to a subset of C implementations.
fun3 will attempt to print the value 0x94949494. This is greater than the max 4-byte integer value of 0x7FFFFFFF, so it will "overflow" and (on virtually every computer made today) produce (if I did my arithmetic correctly) the negative number -0x6B6B6B6C, which is -1802201964.
fun1 and fun2 should print the "expected" positive results.

Resources