In C++ primer it says that "if we assign an out of range value to an object of unsigned type the result is the remainder of the value modulo the number of values the target type can hold."
It gives the example:
int main(){
unsigned char i = -1;
// As per the book the value of i is 255 .
}
Can anybody please explain it to me how this works.
the result is the remainder of the value modulo the number of values the target type can hold
Start with "the number of values the target type can hold". For unsigned char, what is this? The range is from 0 to 255, inclusive, so there are a total of 256 values that can be represented (or "held").
In general, the number of values that can be represented in a particular unsigned integer representation is given by 2n, where n is the number of bits used to store that type.
An unsigned char is an 8-bit type, so 28 == 256, just as we already knew.
Now, we need to perform a modulo operation. In your case of assigning -1 to unsigned char, you would have -1 MOD 256 == 255.
In general, the formula is: x MOD 2n, where x is the value you're attempting to assign and n is the bit width of the type to which you are trying to assign.
More formally, this is laid out in the C++11 language standard (§ 3.9.1/4). It says:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.*
* This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
Perhaps an easier way to think about modulo arithmetic (and the description that you'll most commonly see used) is that overflow and underflow wrap around. You started with -1, which underflowed the range of an unsigned char (which is 0–255), so it wrapped around to the maximum representable value (which is 255).
It's equivalent in C to C++, though worded differently:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented by the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The literal 1 is of type int. For this explanation, let's assume that sizeof(int) == 4 as it most probably is. So then 1 in binary would look like this:
00000000 00000000 00000000 00000001
Now let's apply the unary minus operator to get the -1. We're assuming two's complement is used as it most probably is (look up two's complement for more explanation). We get:
11111111 11111111 11111111 11111111
Note that in the above numbers the first bit is the sign bit.
As you try to assign this number to unsigned char, for which holds sizeof(unsigned char) == 1, the value would be truncated to:
11111111
Now if you convert this to decimal, you'll get 255. Here the first bit is not seen as a sign bit, as the type is unsigned.
In Stroustrup's words:
If the destination type is unsigned, the resulting value is simply as many bits from the source as will fit in the destination (high-order bits are thrown away if necessary). More precisely, the result is the least unsigned integer congruent to the source integer modulo 2 to the nth, where n is the number of bits used to represent the unsigned type.
Excerpt from C++ standard N3936:
For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned
integer type: “unsigned char”, “unsigned short int”, “unsigned int”, “unsigned long int”,
and “unsigned long long int”, each of which occupies the same amount of storage and has the same
alignment requirements (3.11) as the corresponding signed integer type47; that is, each signed integer type
has the same object representation as its corresponding unsigned integer type.
I was going through the excerpt from C++ primer myself and I think that I have kind of figured out a way to mathematically figure out how those values come out(feel free to correct me if I'm wrong :) ). Taking example of the particular code below.
unsigned char c = -4489;
std::cout << +c << std::endl; // will yield 119 as its output
So how does this answer of 119 come out?
well take the 4489 and divide it by the total number of characters ie 2^8 = 256 which will give you 137 as remainder.
4489 % 256 = 137.
Now just subtract that 137 from 256.
256 - 137 = 119.
That's how we simply derive the mod value. Do try it for yourself on other values as well. Has worked perfectly accurate for me!
Related
I want the following idea: I have short a = -4; and I have an unsigned short b = (unsigned short)a;
When I printf("%hu", b), why doesn't it print 4? How can I convert a negative integer to a positive integer using casting?
short and unsigned short are normally 16 bit integers. So limits are -32768 to 32767 for the signed version (short) and 0 to 65535 for unsigned short. Casting from signed to unsigned just wraps values, so -4 will be casted to 65532.
This is the way casting to unsigned works in C language.
If you accept to use additions/substractions, you can do:
65536l - (unsigned short) a
The operation will use the long type (because of the l suffix) which is required to be at least a 32 bits integer type. That should successfully convert any negative short integer to its absolute value.
It sounds like you want the absolute value of the number, not a cast. In C we have the abs / labs / llabs functions for that, found in <stdlib.h>.
If you know ahead of time that the value is negative, you can also just negate it: -a.
If you were instead just looking to get the absolute value, you should have used the abs() function from <stdlib.h>.
Otherwise, when you make such a cast, you trigger the following conversion rule from C17 6.3.1.3:
Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type. 60)
Where foot note 60) is important:
The rules describe arithmetic on the mathematical value, not the value of a given type of expression.
Mathematical value meaning we shouldn't consider wrap-around etc when doing the calculation. And the signedness format (2's complement etc) doesn't matter either. Applying this rule, then:
You have mathematical value -4 and convert from signed short to unsigned short.
We add one more than the maximum value. That is -4 + USHRT_MAX + 1, where UINT_MAX is likely 2^16 = 65535.
-4 + 65535 + 1 = 65532. This is in range of the new type unsigned short.
We got in range at the first try, but "repeatedly adding or subtracting" is essentially the same as taking the value modulo (max + 1).
This conversion is well-defined and portable - you will get the same result on all systems where short is 16 bits.
In C11 standard
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type
other than _Bool, if the value can be represented by the new type, it
is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type. 60)
3 Otherwise, the new type is signed and the value cannot be
represented in it; either the result is implementation-defined or an
implementation-defined signal is raised.
In the first point,
I was wondering what it means by "if the value can be represented by the new type" in the first point? Two integer types might have different ranges of integers, but can have the same range of bit representations. (For example, unsigned int and int.)
Are both the bit representation and the integer value not changed before and after the conversion?
Thanks.
The C standard does not dictate one specific representation for integers. While most implementations you are likely to come across use two's complement for signed integers, that is not guaranteed. It also allows for one's complement where negating a number means inverting all bits (and not adding 1 as two's complement does), or sign-and-magnitude where negating a number means inverting only the high order bit.
As such, the language of the standard talks about what happens to the value of an integer when it is converted, not the representation.
As an example, suppose an int has range -231 to 231-1 and unsigned int has range 0 to 232-1. If you have an int with the value 45 that is converted to unsigned int, that value can be represented in both types so now you have an unsigned int with value 45.
Now suppose your int has value -1000. That value can not be represented in an unsigned int so it has to be converted as per the rule in clause 2, i.e. 232 is added to -1000 to result in 232 - 1000 == 4294966296. Now in two's complement representation, an unsigned int with value 4294966296 and an int with value -1000 happen to have the same representation which is FFFFFC18 in hex. This equivalence does not hold true for one's complement or sign-and-magnitude.
So converting a number to a different type may or may not change the representation, depending on the implementation.
I was goint through k & r. I was having problem in understanding following lines on page 197(section A6)
Integral conversions: any integer is
converted to a given unsigned type by
finding the smallest non negative
value that is congruent to that
integer,modulo one more than the
largest value that can be represented
in the unsigned type.
Can any body explain this in a bit detail.
Thanks
It means only low value bits will be count and high order bits will be discarded.
For example:
01111111 11111111 11110000 00001111
when converted to a 16 bit unsigned short will be:
11110000 00001111
This is effectively mathematically expressed in:
target_value = value % (target_type_max+1) ( % = modulus operator )
any integer is converted to a given unsigned type by finding the smallest non negative value that is congruent to that integer,modulo one more than the largest value that can be represented in the unsigned type.
Let's take this bit by bit and from backwards:
What is the largest value that can be represented in the unsigned type of width n bits?
2^(n) - 1.
What is one more than this value?
2^n.
How does the conversion take place?
unsigned_val = signed_val % 2^n
Now, the why part: The standard does not mandate what bit representation is used. Hence the jargon. In a two's complement representation -- which is by far the most commonly used -- this conversion does not change the bit pattern (unless there is a a truncation, of course).
Refer to Integral Conversions from the Standard for further details.
I understand that character variable holds from (signed)-128 to 127 and (unsigned)0 to 255
char x;
x = 128;
printf("%d\n", x);
But how does it work? Why do I get -128 for x?
printf is a variadic function, only providing an exact type for the first argument.
That means the default promotions are applied to the following arguments, so all integers of rank less than int are promoted to int or unsigned int, and all floating values of rank smaller double are promoted to double.
If your implementation has CHAR_BIT of 8, and simple char is signed and you have an obliging 2s-complement implementation, you thus get
128 (literal) to -128 (char/signed char) to -128 (int) printed as int => -128
If all the listed condition but obliging 2s complement implementation are fulfilled, you get a signal or some implementation-defined value.
Otherwise you get output of 128, because 128 fits in char / unsigned char.
Standard quote for case 2 (Thanks to Matt for unearthing the right reference):
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.60)
3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-defined or an implementation-defined signal is raised.
This all has nothing to do with variadic functions, default argument promotions etc.
Assuming your system has signed chars, then x = 128; is performing an out-of-range assignment. The behaviour of this is implementation-defined ; meaning that the compiler may choose an action but it must document what it does (and therefore, do it reliably). This action is allowed to include raising a signal.
The usual behaviour that modern compilers do for out-of-range assignment is to truncate the representation of the value to fit in the destination type.
In binary representation, 128 is 000....00010000000.
Truncating this into a signed char gives the signed char of binary representation 10000000. In two's complement representation, which is used by all modern C systems for negative numbers, this is the representation of the value -128. (For historical curiousity: in one's complement this is -127, and in sign-magnitude, this is -0 which may be a trap representation and thus raise a signal).
Finally, printf accurately prints out this char's value of -128. The %d modifier works for char because of the default argument promotions and the facts that INT_MIN <= CHAR_MIN and INT_MAX >= CHAR_MAX.; this behaviour is guaranteed except on systems which have plain char as unsigned, and sizeof(int)==1 (which do exist but you'd know about it if you were on one).
Lets look at the binary representation of 128 when stored into 8 bits:
1000 0000
And now let's look at the binary representation of -128 when stored into 8 bits:
1000 0000
The standard for char with your current setup looks to be a signed char (note this isn't in the c standard, look here if you don't believe me) and thus when you're assigning the value of 128 to x you're assigning it the value 1000 0000 and thus when you compile and print it out it's printing out the signed value of that binary representation (meaning -128).
It turns out my environment is the same in assuming char is actually signed char. As expected if I cast x to be an unsigned char then I get the expected output of 128:
#include <stdio.h>
#include <stdlib.h>
int main() {
char x;
x = 128;
printf("%d %d\n", x, (unsigned char)x);
return 0;
}
gives me the output of -128 128
Hope this helps!
I was goint through k & r. I was having problem in understanding following lines on page 197(section A6)
Integral conversions: any integer is
converted to a given unsigned type by
finding the smallest non negative
value that is congruent to that
integer,modulo one more than the
largest value that can be represented
in the unsigned type.
Can any body explain this in a bit detail.
Thanks
It means only low value bits will be count and high order bits will be discarded.
For example:
01111111 11111111 11110000 00001111
when converted to a 16 bit unsigned short will be:
11110000 00001111
This is effectively mathematically expressed in:
target_value = value % (target_type_max+1) ( % = modulus operator )
any integer is converted to a given unsigned type by finding the smallest non negative value that is congruent to that integer,modulo one more than the largest value that can be represented in the unsigned type.
Let's take this bit by bit and from backwards:
What is the largest value that can be represented in the unsigned type of width n bits?
2^(n) - 1.
What is one more than this value?
2^n.
How does the conversion take place?
unsigned_val = signed_val % 2^n
Now, the why part: The standard does not mandate what bit representation is used. Hence the jargon. In a two's complement representation -- which is by far the most commonly used -- this conversion does not change the bit pattern (unless there is a a truncation, of course).
Refer to Integral Conversions from the Standard for further details.