I want the following idea: I have short a = -4; and I have an unsigned short b = (unsigned short)a;
When I printf("%hu", b), why doesn't it print 4? How can I convert a negative integer to a positive integer using casting?
short and unsigned short are normally 16 bit integers. So limits are -32768 to 32767 for the signed version (short) and 0 to 65535 for unsigned short. Casting from signed to unsigned just wraps values, so -4 will be casted to 65532.
This is the way casting to unsigned works in C language.
If you accept to use additions/substractions, you can do:
65536l - (unsigned short) a
The operation will use the long type (because of the l suffix) which is required to be at least a 32 bits integer type. That should successfully convert any negative short integer to its absolute value.
It sounds like you want the absolute value of the number, not a cast. In C we have the abs / labs / llabs functions for that, found in <stdlib.h>.
If you know ahead of time that the value is negative, you can also just negate it: -a.
If you were instead just looking to get the absolute value, you should have used the abs() function from <stdlib.h>.
Otherwise, when you make such a cast, you trigger the following conversion rule from C17 6.3.1.3:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60)
Where foot note 60) is important:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
Mathematical value meaning we shouldn't consider wrap-around etc when doing the calculation. And the signedness format (2's complement etc) doesn't matter either. Applying this rule, then:
You have mathematical value -4 and convert from signed short to unsigned short.
We add one more than the maximum value. That is -4 + USHRT_MAX + 1, where UINT_MAX is likely 2^16 = 65535.
-4 + 65535 + 1 = 65532. This is in range of the new type unsigned short.
We got in range at the first try, but "repeatedly adding or subtracting" is essentially the same as taking the value modulo (max + 1).
This conversion is well-defined and portable - you will get the same result on all systems where short is 16 bits.
Related
I was doing some experiments in a code in order to prove the theory. This is my code:
#include <stdio.h>
int main(){
unsigned int x,y,z;
if(1){
x=-5;
y=5;
z=x+y;
printf("%i",z);
}
return 0;
}
But for what I know the output should have been 10, but instead it prints 0, why this is happening? why I can assign a negative value to an unsigned int data type
From section 6.5.16.1 of the C standard:
In simple assignment (=), the value of the right operand is converted
to the type of the assignment expression and replaces the value stored
in the object designated by the left operand.
From section 6.3.1.3 Signed and unsigned integers:
... Otherwise, if the new type is unsigned, the value is converted
by repeatedly adding or subtracting one more than the maximum value
that can be represented in the new type until the value is in the
range of the new type.
So x=-5 assigns UINT_MAX + 1 - 5 to x.
Signed to unsigned conversion happens as per "mathematic modulus by UINT_MAX+1", from C17 6.3.1.3:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type
Now as it happens, on 2's complement computers this is the same as taking the binary representation of the signed number and treat it like an unsigned number. In your case -5 is represented as binary 0xFFFFFFFB, so the unsigned number ends up as 4294967291. And 4294967291 + 5 creates an unsigned wrap-around from UINT_MAX = 4294967295 to 0 (which is well-defined, unlike signed overflow).
So -5 does not just discard the sign when converted to unsigned. If that's what you want to happen, use the abs() function from stdlib.h.
It's a basic characteristic of pretty much all integral types that they have a defined range of values they can hold. But then the question is, what happens if you try to set a value outside that range? For C's unsigned types, the answer is that they operate via modular arithmetic.
On a modern machine, type unsigned int probably has a range of 0 to 4294967295. Obviously -5 does not fit into that range. So modular arithmetic says that we add or subtract some multiple of 4294967296 until we get a value that is in the range. Now, -5 + 4294967296 is 4294967291, and that is in range, so that's the value which gets stored in your variable x.
So then x + y will be 4294967296, but that's not in the range of 0 to 4294967295, either. But if we subtract 4294967296, we get 0, and that's in range, so that's our answer.
And along the way we've discovered how two's complement arithmetic works. It turns out that, if we had declared x as a signed int and set it to -5, it would have ended up containing the same bit pattern as 4294967291. And as we've seen, 4294967291 is precisely the bit pattern we want to be able to add to 5 in order to get 0 (after wrapping around, that is). So 4294967291 is a great internal value to use for -5, since you obviously want -5 + 5 to be 0.
Why i can assign a negative value to an unsigned int data type?
Assigning an out-of-range value to an unsigned type is well defined.
But first, in trying to report the value, code invokes undefined behavior (UB) using a mismatched printf() specifier with unsigned:
// printf("%i",z); // Bad
printf("%u\n",z); // Good
Try printf("%u %u %u %u\n", x, y, z, UINT_MAX); to properly see all 4 values.
x=-5; assigns an out-of-range value to an unsigned. With unsigned types, the value is converted to an in-range value by adding/subtracting the max value of the type + 1 until in range. In this case x will have the value of UINT_MAX + 1 - 5.
y=5; is OK.
x+y will then incur unsigned math overflow. The sum is also converted to an in-range value in a like-wise manner.
x+y will have the value of (UINT_MAX + 1 - 5) + 5 --> (UINT_MAX + 1 - 5) + 5 - (UINT_MAX + 1) --> 0.
What language in the standard makes this code work, printing '-1'?
unsigned int u = UINT_MAX;
signed int s = u;
printf("%d", s);
https://en.cppreference.com/w/c/language/conversion
otherwise, if the target type is signed, the behavior is implementation-defined (which may include raising a signal)
https://gcc.gnu.org/onlinedocs/gcc/Integers-implementation.html#Integers-implementation
GCC supports only two’s complement integer types, and all bit patterns are ordinary values.
The result of, or the signal raised by, converting an integer to a signed integer type when the value cannot be represented in an object of that type (C90 6.2.1.2, C99 and C11 6.3.1.3):
For conversion to a type of width N, the value is reduced modulo 2^N to be within range of the type; no signal is raised.
To me it seems like converting UINT_MAX to an int would therefore mean dividing UINT_MAX by 2^(CHAR_BIT * sizeof(int)). For the sake of argument, with 32 bit ints, 0xFFFFFFFF / 2^32 = 0xFFFFFFFF. So this doesnt really explain how the value '-1' ends up in the int.
Is there some language somewhere else that says after the modulo division we just reinterpret the bits? Or some other part of the standard that takes precedence before the parts I have referenced?
No part of the C standard guarantees that your code shall print -1 in general. As it says, the result of the conversion is implementation-defined. However, the GCC documentation does promise that if you compile with their implementation, then your code will print -1. It's nothing to do with bit patterns, just math.
The clearly intended reading of "reduced modulo 2^N" in the GCC manual is that the result should be the unique number in the range of signed int that is congruent mod 2^N to the input. This is a precise mathematical way of defining the "wrapping" behavior that you expect, which happens to coincide with what you would get by reinterpreting the bits.
Assuming 32 bits, UINT_MAX has the value 4294967295. This is congruent mod 4294967296 to -1. That is, the difference between 4294967295 and -1 is a multiple of 4294967296, namely 4294967296 itself. Moreover, this is necessarily the unique such number in [-2147483648, 2147483647]. (Any other number congruent to -1 would be at least -1 + 4294967296 = 4294967295, or at most -1 - 4294967296 = -4294967297). So -1 is the result of the conversion.
In other words, add or subtract 4294967296 repeatedly until you get a number that's in the range of signed int. There's guaranteed to be exactly one such number, and in this case it's -1.
Is there an easy way to replace an overflow at counting down with a negative value?
For example a 32 bit variable. Possible values are 0x00000000 - 0xFFFFFFFF. When I subtract 1 from the lowest possible value (0x00000000 - 1), the result is 0xFFFFFFFF. How can the operation be changed to give a result of -1?
Is there an easy way to replace an overflow at counting down with a negative value?
Use wider math is a direct approach.
"example a 32 bit variable. Possible values are 0x00000000 - 0xFFFFFFFF" implies that the variable is some unsigned type like uint32_t.
Subtracting 1 from (uint32_t)0 is (uint32_t)0xFFFFFFFF as OP reported. So instead use a wider signed math like long long (which is at least 64-bit) or int64_t
// Insure subtraction is done using `long long` math with 1LL
long long result = var_uint32_bit - 1LL;
Alternatively code could stick with a corresponding same width signed type.
// Only non-implementation defined for values 0...0x7FFFFFFF
int32_t result = (int32_t)var_uint32_bit - 1;
The (int32_t)var_uint32_bit has a limitation concerning conversion of an unsigned type to a signed integer type.
Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised. C11dr §6.3.1.3 3
So I'm trying to interpret the following output:
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("v = %d, uv = %u\n", v, uv);
Output:
v = -12345
uv = 53191
So the question is: why is this exact output generated when this program is run on a two's complement machine?
What operations lead to this result when casting the value to unsigned short?
My answer assumes 16-bit two's complement arithmetic.
To find the value of -12345, take 12345, complement it, and add 1.
12345 is 0x3039 is 0011000000111001.
Complementing means changing all the 1's to 0's and all the 0's to 1's:
1100111111000110 is 0xcfc6 is 53190.
Add one: 53191.
So internally, -12345 is represented by 0xcfc7 = 53191.
But if you interpret it as an unsigned number, it's obviously just 53191. (And when you assign a signed value to an unsigned integer of the same size, what typically ends up happening is that you assign the exact bit pattern, without converting anything. Later, however, you will typically interpret that value differently, such as when you print it with %u.)
Another, perhaps easier way to think about this is that 16-bit arithmetic "wraps around" at 216 = 65536. So you can think of 65536 as being another name for 0 (just like 0:00 and 24:00 are both names for midnight). So -12345 is 65536 - 12345 = 53191.
The conversion rules, when converting signed integer to an unsigned integer, defined by C standard requires by repeatedly adding the TYPE_MAX + 1 to the value.
From 6.3.1.3 Signed and unsigned integers:
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
If USHRT_MAX is 65535 and then adding 65535 + 1 + -12345 is 53191.
The output seen does not depend on 2's complement nor 16 or 32- bit int. The output seen is entirely defined and would be the same on a rare 1's complement or sign-magnitude machine. The result does depend on 16-bit unsigned short
-12345 is within the minimum range of a short, so no issues with that assignment. When a short is passed as a ... argument, is goes thought the usual promotion to an int with no change in value as all short are in the range of int. "%d" expects an int, so the output is "-12345"
short int v = -12345;
printf("%d\n", v); // output "-12345\n"
Assigning a negative number to a unsigned type is well defined. With a 16-bit unsigned short, the value of uv is -12345 plus the minimum multiples of USHRT_MAX+1 (65536 in this case) to a final value of 53191.
Passing an unsigned short as an ... argument, the value is converted to int or unsigned, whichever type contains the entire range of unsigned short. IAC, the values does not change. "%u" matches an unsigned. It also matches an int whose values are expressible as either an int or unsigned.
short int v = -12345;
unsigned short uv = (unsigned short) v;
printf("%u\n", v); // output "53191\n"
What operations lead to this result when casting the value to unsigned short?
The casting did not affect the final outcome. The same result would have occurred without the cast. The cast may be useful to quiet warnings.
In C++ primer it says that "if we assign an out of range value to an object of unsigned type the result is the remainder of the value modulo the number of values the target type can hold."
It gives the example:
int main(){
unsigned char i = -1;
// As per the book the value of i is 255 .
}
Can anybody please explain it to me how this works.
the result is the remainder of the value modulo the number of values the target type can hold
Start with "the number of values the target type can hold". For unsigned char, what is this? The range is from 0 to 255, inclusive, so there are a total of 256 values that can be represented (or "held").
In general, the number of values that can be represented in a particular unsigned integer representation is given by 2n, where n is the number of bits used to store that type.
An unsigned char is an 8-bit type, so 28 == 256, just as we already knew.
Now, we need to perform a modulo operation. In your case of assigning -1 to unsigned char, you would have -1 MOD 256 == 255.
In general, the formula is: x MOD 2n, where x is the value you're attempting to assign and n is the bit width of the type to which you are trying to assign.
More formally, this is laid out in the C++11 language standard (§ 3.9.1/4). It says:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.*
* This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
Perhaps an easier way to think about modulo arithmetic (and the description that you'll most commonly see used) is that overflow and underflow wrap around. You started with -1, which underflowed the range of an unsigned char (which is 0–255), so it wrapped around to the maximum representable value (which is 255).
It's equivalent in C to C++, though worded differently:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented by the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The literal 1 is of type int. For this explanation, let's assume that sizeof(int) == 4 as it most probably is. So then 1 in binary would look like this:
00000000 00000000 00000000 00000001
Now let's apply the unary minus operator to get the -1. We're assuming two's complement is used as it most probably is (look up two's complement for more explanation). We get:
11111111 11111111 11111111 11111111
Note that in the above numbers the first bit is the sign bit.
As you try to assign this number to unsigned char, for which holds sizeof(unsigned char) == 1, the value would be truncated to:
11111111
Now if you convert this to decimal, you'll get 255. Here the first bit is not seen as a sign bit, as the type is unsigned.
In Stroustrup's words:
If the destination type is unsigned, the resulting value is simply as many bits from the source as will fit in the destination (high-order bits are thrown away if necessary). More precisely, the result is the least unsigned integer congruent to the source integer modulo 2 to the nth, where n is the number of bits used to represent the unsigned type.
Excerpt from C++ standard N3936:
For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned
integer type: “unsigned char”, “unsigned short int”, “unsigned int”, “unsigned long int”,
and “unsigned long long int”, each of which occupies the same amount of storage and has the same
alignment requirements (3.11) as the corresponding signed integer type47; that is, each signed integer type
has the same object representation as its corresponding unsigned integer type.
I was going through the excerpt from C++ primer myself and I think that I have kind of figured out a way to mathematically figure out how those values come out(feel free to correct me if I'm wrong :) ). Taking example of the particular code below.
unsigned char c = -4489;
std::cout << +c << std::endl; // will yield 119 as its output
So how does this answer of 119 come out?
well take the 4489 and divide it by the total number of characters ie 2^8 = 256 which will give you 137 as remainder.
4489 % 256 = 137.
Now just subtract that 137 from 256.
256 - 137 = 119.
That's how we simply derive the mod value. Do try it for yourself on other values as well. Has worked perfectly accurate for me!