Sign extension query in case of short

Sign extension query in case of short - c

Given,
unsigned short y = 0xFFFF;
When I print
printf("%x", y);
I get : 0xFFFF;
But when I print
printf("%x", (signed short)y);
I get : 0xFFFFFFFF
Whole program below:
#include <stdio.h>
int main() {
unsigned short y = 0xFFFF;
unsigned short z = 0x7FFF;
printf("%x %x\n", y,z);
printf("%x %x", (signed short)y, (signed short)z);
return 0;
}
Sign extension happens when we typecast lower to higher byte data type, but here we are typecasting short to signed short.
In both cases sizeof((signed short)y) or sizeof((signed short)z) prints 2 bytes. Short remains of 2 bytes, if sign bit is zero as in case of 0x7fff.
Any help is very much appreciated!

Output of the first printf is as expected. The second printf produces undefined behavior.
In C language when you pass a a value smaller than int as a variadic argument, that value is always implicitly converted to type int. It is not possible to physically pass a short or char variadic argument. That implicit conversion to int is where your "sign extension" takes place.
For this reason, your printf("%x", y); is equivalent to printf("%x", (int) y);. The value that is passed to printf is 0xFFFF of type int. Technically, %x format requires an unsigned int argument, but a non-negative int value is also OK (unless I'm missing some technicality). The output is 0xFFFF.
Conversion to int happens in the second case as well. I.e. your printf("%x", (signed short) y); is equivalent to printf("%x", (int) (signed short) y);. The conversion of 0xFFFF to (signed short) is implementation-defined, because 0xFFFF is apparently out of range of signed short on your platform. But most likely it produces a negative value (-1). When converted to int it produces the same negative value of type int (again, -1 represented as 0xFFFFFFFF for a 32-bit int). The further behavior is undefined, since you are passing a negative int value for format specifier %x, which requires unsigned int argument. It is illegal to use %x with negative int values.
In other words, formally your second printf prints unpredictable garbage. But practically the above explains where that 0xFFFFFFFF came from.

Let's break it down and into smaller pieces:
Given,
unsigned short y = 0xFFFF;
Assuming two-bytes unsigned short maximum value is 2^16-1, that is indeed 0xFFFF.
When I print
printf("%x", y);
Due to default argument promotion (as printf() is variadic function) value of y is implicitly promoted to type int. With %x format-specified it's treated as unsigned int. Assuming common two-complement's representation and four-bytes int type, that means that as most-significant bit is set to zero, the bit patterns of int and unsigned int are simply the same.
But when I print
printf("%x", (signed short)y);
What you have done is cast to signed type, that cannot represent value of 0xFFFF. Such conversion as standard stays is implementation-defined, so you can get whatever result. After implicit conversion to int apparently you have bit-patern of 32-ones, that are represented as 0xFFFFFFFF.

Related

What is the algorithm that a compiler would use while casting signed variables to larger variable types, C language?

The answer might be compiler dependent but;
What is the expected output of the lines below?
signed char a = -5;
printf("%x \n", (signed short) a);
printf("%x \n", (unsigned short) a);
Would a compiler fill the Most Significant Bits with zeros (0) or ones (1) while casting signed char to a larger variable? How and when?
P.S. There are other issues too. I tried to run the code below on an online compiler for testing. The outputs were not as I expected. So I added the verbose castings, but it did not work. Why is the output of printf("%x \n", (signed char)b); 4 bytes long instead of 1?
int main()
{
unsigned char a = (unsigned char)5;
signed char b = (signed char)-5;
unsigned short c;
signed short d;
c = (unsigned short)b;
d = (signed short)b;
printf("%x ||| %x ||| %x ||| %x\n", (unsigned char)a, (signed char)b, c, d);
printf("%d ||| %d ||| %d ||| %d\n", a, b, c, d);
printf("%d ||| %d ||| %d ||| %d\n", a, b, (signed char)c, (signed char)d);
return 0;
}
Output:
5 ||| fffffffb ||| fffb ||| fffffffb
5 ||| -5 ||| 65531 ||| -5
5 ||| -5 ||| -5 ||| -5

In C, arguments to variadic functions (like printf) which are of lower rank than int are converted to int. (Not unsigned int unless the argument is unsigned and the same width as int).
Converting a signed short or signed char to signed int does not change the value. If you start with -5, you end up with -5.
But if you convert a negative signed value to an unsigned type (using an explicit cast, for example), the conversion is done modulo one more than the maximum value of the unsigned type. For example, the maximum value of an unsigned short is 65535 (on many implementations), so converting -5 to unsigned short results in -5 modulo 65536, which is 65531. (C's % operator does not produce mathematical modular reduction.) When that value is then implicitly converted to an int, it is still 65531, so that's what's printed with %x (fffb).
Note that it is technically incorrect to apply the format %x to a signed int. %x requires that the corresponding argument be an unsigned int. Currently, C does not guarantee what the result of interpreting a signed value as unsigned will be, but that will soon change. (It's not a conversion. At runtime, types no longer exist, and values are just bit patterns.)

Conversions between integer types are value preserving when the value being converted is representable in the destination type. signed short can represent all values representable by signed char, so this ...
signed char a = -5;
printf("%hd\n", (signed short) a);
... would be expected to output a line containing "-5".
Your code, however, has undefined behavior. The conversion specifier %x requires the corresponding argument to have type unsigned int, whereas you are passing a signed short (converted to int according to the default argument promotions).
Provided that your implementation uses two's complement representation for signed integers (and I feel safe in asserting that it does), the representation will have sign-extended the original signed char to the width of a signed short, and then sign-extended that to the width of a (signed) int. Thus, one reasonably likely manifestation of the UB in your ...
printf("%x \n", (signed short) a);
... would be to print
fffffffb
The other case is a bit different. Integer conversions where the target type is unsigned and cannot represent the source value are well defined. The source value is converted to the destination type by reducing it modulo the number of representable values in the target type. Thus, if your unsigned short has 16 value bits then the result of converting -5 to unsigned short is -5 modulo 65536, which is 65531.
Thus,
printf("%hu\n", (unsigned short) a);
would be expected to print a line containing "65531".
Again, the %x conversion specifier does not match the type of the corresponding argument ((unsigned short) a, converted to int via the default argument promotions), so your printf has undefined behavior. However, the conversion of a 16-bit unsigned short to a 32-bit int on a two's complement system will invole zero-extending the representation of the source, so one reasonably likely manifestation of the UB in your ...
printf("%x \n", (unsigned short) a);
... would be to print
fffb
.

The exact rules for converting between signed and unsigned types are listed in section 6.3.1.3 of the C11 standard:
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new
type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
As for what the above means for this code:
signed char a = -5;
printf("%x \n", (signed short) a);
printf("%x \n", (unsigned short) a);
There are a few things going on here.
For the first printf, you first have a conversion from signed char to signed short. By clause 1 above, since the value -5 can be stored in both, the value is unchanged by the cast. Then, because this value is passed to a variadic function, it is then promoted to type int, and again by clause 1 the value is unchanged.
Then the resulting int value is printed with the %x format specifier, which is expecting an unsigned int. This is technically undefined behavior for a mismatched format specifier, although most implementations will allow for implicit signed / unsigned reinterpretation. So assuming two's complement representation, the representation of the int value -5 will be printed, and assuming a 32 bit int this will be fffffffb.
For the second printf, the conversion from signed char to unsigned short will happen according to clause 2 above since the value -5 can't be stored in a unsigned short. Assuming a 16 bit short, this gives you the value 65536 - 5 = 65531. And assuming two complement representation, this is equivalent to sign-extending the representation from fb to fffb. This unsigned short value is then promoted to int when it is passed to printf, and by clause 1 the value is unchanged. Then the %x format specifier prints this as fffb.

How does printf knows if variable passed signed or unsigned

Given the following code snippet:
signed char x = 150;
unsigned char y = 150;
printf("%d %d\n", x, y);
The output is:
-106 150
However, I'm using the same format specifier, for variables that are represented in memory in the same way. How does printf knows whether it's signed or unsigned.
Memory representation in both cases is:
10010110

signed char x = 150; incurs implementation defines behavior as 150 is not in the range of OP's signed char.
The 150 is an int and not fitting in the signed char range undergoes:
the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C17dr § 6.3.1.3 3
In this case, x took on the value of 150 - 256.
Good code would not assume this result of -106 and instead not assign to a signed char values outside it range.
Then ...
Commonly, both signed char x and unsigned char y are promoted to int before being passed as arguments to a ... function due to the usual arithmetic conversions. (types in the range of int are promoted to int).
Thus printf("%d %d\n", x, y); is not a problem. printf() receive 2 ints and that matches the "%d" specifiers.

Let's first recognize this issue:
char x = 150;
x never had the value 150 to begin with. That 150 is going to get auto casted to signed char. Hence x, is immediately going to assume the value of -106, since 150 can't be represented within a signed 8-bit value. You might as well have said:
char x = (signed char)150; // same as -106, which is 0x96 in hex
Second, char and short values when passed as variable arguments get auto promoted int. as part of being put on the stack. This includes getting sign-extended.
So when you invoke printf("%d %d\n", x, y);, the compiler will massage it to really be this:
printf("%d %d\n", (int)x, (unsigned int)y);
the following gets put onto the stack:
"%d %d\n"
0xffffff96 (int)x
0x00000096 (unsigned int)y
When printf runs, it parses the formatting string on the stack (%d %d\n) and sees it needs to interpret the next two items on the stack as signed integers. It references 0xffffff96 and 00000096 as value on the stack respectively and renders both to the console in decimal form.

How does printf knows if variable passed signed or unsigned?
The printf function doesn't "know".
You effectively tell it by using either a signed conversion specifier (d or i) or an unsigned conversion specifier (o, u, x or X).
And if you print a signed integer as unsigned or vice versa, printf just does what you told it to do.
I used the same specifier "%d", and it printed different values (the positive one and the negative one"
In your example, you are printing signed and unsigned char values.
signed char x = 150;
The value in x is -106 (8 bits signed) because 150 is greater than the largest value for char. (The char type's range is -128 to +127 with any hardware / C compiler that you are likely to encounter.)
unsigned char y = 150;
The value in y is 150 (8 bits unsigned) as expected.
At the call site. The char value -108 is sign extended to a larger integer type. The unsigned char value 150 is converted without sign extension.
By the time printf is called, the values that are have been passed to it have a different representation.

How does printf know when it's being passed a signed int

I'm trying to figure out how variables really work in C and find it strange how the printf function seems to know the difference between different variables of the same size, I'm assuming they're both 16 bits.
#include <stdio.h>
int main(void) {
unsigned short int positive = 0xffff;
short int negative = 0xffff;
printf("%d\n%d\n", positive, negative);
return 0;
}
Output:
65535
-1

I think we have to more carefully distinguish between the type conversions on the different integer types on one hand and the printf format specifiers (allowing to force printf how to interpret the data) on the other hand.
Systematically:
printf("%hd %hd\n", positive, negative);
// gives: -1 -1
Both values are interpreted as signed short int by printf, regardless of the declaration.
printf("%hu %hu\n", positive, negative);
// gives: 65535 65535
Both values are interpreted as unsigned short int by printf, regardless of the declaration.
However,
printf("%d %d\n", positive, negative);
// gives: 65535 -1
Both values are implicitly converted to (a longer) int, while the sign is kept.
Finally,
printf("%u %u\n", positive, negative);
// gives 65535 4294967295
Again, both values are implicitly converted to int, while the sign is kept, but then the negative value is interpreted as unsigned. As we can see here, plain int is actually 32-bit (on this system).
Curiously, only if I compile with gcc and -Wpedantic, it gives me a warning for the assignment short int negative = 0xffff;.

Unsigned values in C

I have the following code:
#include <stdio.h>
int main() {
unsigned int a = -1;
int b = -1;
printf("%x\n", a);
printf("%x\n", b);
printf("%d\n", a);
printf("%d\n", b);
printf("%u\n", a);
printf("%u\n", b);
return 0;
}
The output is:
ffffffff
ffffffff
-1
-1
4294967295
4294967295
I can see that a value is interpreted as signed or unsigned according to the value passed to printf function. In both cases, the bytes are the same (ffffffff). Then, what is the unsigned word for?

Assign a int -1 to an unsigned: As -1 does not fit in the range [0...UINT_MAX], multiples of UINT_MAX+1 are added until the answer is in range. Evidently UINT_MAX is pow(2,32)-1 or 429496725 on OP's machine so a has the value of 4294967295.
unsigned int a = -1;
The "%x", "%u" specifier expects a matching unsigned. Since these do not match, "If a conversion specification is invalid, the behavior is undefined.
If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined." C11 §7.21.6.1 9. The printf specifier does not change b.
printf("%x\n", b); // UB
printf("%u\n", b); // UB
The "%d" specifier expects a matching int. Since these do not match, more UB.
printf("%d\n", a); // UB
Given undefined behavior, the conclusions are not supported.
both cases, the bytes are the same (ffffffff).
Even with the same bit pattern, different types may have different values. ffffffff as an unsigned has the value of 4294967295. As an int, depending signed integer encoding, it has the value of -1, -2147483647 or TBD. As a float it may be a NAN.
what is unsigned word for?
unsigned stores a whole number in the range [0 ... UINT_MAX]. It never has a negative value. If code needs a non-negative number, use unsigned. If code needs a counting number that may be +, - or 0, use int.
Update: to avoid a compiler warning about assigning a signed int to unsigned, use the below. This is an unsigned 1u being negated - which is well defined as above. The effect is the same as a -1, but conveys to the compiler direct intentions.
unsigned int a = -1u;

Having unsigned in variable declaration is more useful for the programmers themselves - don't treat the variables as negative. As you've noticed, both -1 and 4294967295 have exact same bit representation for a 4 byte integer. It's all about how you want to treat or see them.
The statement unsigned int a = -1; is converting -1 in two's complement and assigning the bit representation in a. The printf() specifier x, d and u are showing how the bit representation stored in variable a looks like in different format.

When you initialize unsigned int a to -1; it means that you are storing the 2's complement of -1 into the memory of a.
Which is nothing but 0xffffffff or 4294967295.
Hence when you print it using %x or %u format specifier you get that output.
By specifying signedness of a variable to decide on the minimum and maximum limit of value that can be stored.
Like with unsigned int: the range is from 0 to 4,294,967,295 and int: the range is from -2,147,483,648 to 2,147,483,647
For more info on signedness refer this

A strange phenomenon using casting in C

the following code:
int main() {
int small_num = 0x12345678;
int largest_num = 0xFFFFFFFF;
printf("small: without casting to short: 0x%.8x, with casting to short: 0x%.8x\n", small_num>>16, (short)(small_num>>16));
printf("large: without casting to short: 0x%.8x, with casting to short: 0x%.8x\n", largest_num>>16, (short)(largest_num>>16));
return 0;
}
gives me the output (using codepad):
small: without casting to short: 0x00001234, with casting to short: 0x00001234
large: without casting to short: 0xffffffff, with casting to short: 0xffffffff
That's seems extremely strange. Anyone have an idea why it happens this way?

When you are casting to (short) in the printf call, then the compiler will cast it from short back to int, which is the parameter which is passed to printf. Therefore, 1234 will be mapped to 1234, and ffff (which is exactly -1) is mapped to ffffffff. Note that negative integers are expanded from short to long by adding "on bits" on their left.

Short answer
The hexadecimal constant has type unsigned int. When converted to signed int the value becomes -1. Right-shifting a negative value usually leaves the sign-bit unchanged, so -1 >> 16 is still -1. A short int passed to a variadic function gets promoted to signed int which, when interpreted as an unsigned int by the %x conversion specifier, prints out 0xffffffff.
Long answer
However, your code is broken for a number of reasons.
Integer conversion
int largest_num = 0xFFFFFFFF;
The type of the hexadecimal constant is the first of the following types in which its value can be represented: int, unsigned int, long int, unsigned long int, long long int, unsigned long long int.
If int has more than 32 bits, you're fine.
If int has 32 bits or less, The result is implementation-defined (or an implementation-defined signal is raised).
Usually, largest_num will have all bits set and have the value -1.
Shifting a negative value
largest_num>>16
If the value of largest_num is negative, the resulting value is implementation-defined. Usually, the sign bit is left unchanged so that -1 right-shifted is still -1.
Integer promotion
printf ("0x%.8x\n", (short)(largest_num>>16));
When you pass a short int to a variadic function, the value will be promoted to int. A negative value will be preserved when converted to the new type.
However, the "%x" conversion specifier expects an unsigned int argument. Because unsigned int and signed int are not compatible types, the behaviour of the code is undefined. Usually, the bits of the signed int is re-interpreted as an unsigned int, which results in the original value of the hexadecimal constant.
Calling a variadic function
printf(...);
printf() is a variadic function. Variadic functions (typically) use different calling conventions than ordinary functions. Your code invokes undefined behaviour if you don't have a valid declaration of print() in scope.
The usual way to provide a declaration for printf() is to #include <stdio.h>.
Source: n1570 (the last public draft of the current C standard).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight