I'm trying to convert the 8 bits of a char to the least significant bits of an int. I know that the conversion from char to int is easy doable via
int var = (int) var2;
where var2 is of type char (or even without putting (int)).
But I wonder, if I write the code above, are the remaining highest significant (32-8=) 24 bits of the int just random or are they set to 0?
Example:
Let var2 be 00001001, if I write the code above, is var then 00000000 00000000 00000000 00001001?
C11 (n1570), § 6.5.4 Cast operators
Preceding an expression by a parenthesized type name converts the value of the
expression to the named type.
Therefore, the remaining of var are set to 0.
By the way, the explicit cast is unecessary. The conversion from char to int is implicit.
C11 (n1570), § 6.3.1.1 Boolean, characters, and integers
If an int can represent all values of the original type (as restricted by the width, for a
bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions.
A C/C++ compiler can choose for the char type to be either signed or unsigned.
If your compiler defines char to be signed, the upper bits will be sign-extended when it is cast to an int. That is, they will either be all zeros or all ones depending on the value of the sign bit of var2. For example, if var2 has the value -1 (in hex, that's 0xff), var will also be -1 after the assignment (represented as 0xffffffff on a 32-bit machine).
If your compiler defines char to be unsigned, the upper bits will be all zero. For example, if var2 has the value 255 (again 0xff), the value of var will be 255 (0x000000ff).
I'm trying to convert the 8 bits of a char to the least significant
bits of an int
A char isn't guaranteed to be 8 bits. It might be more. Furthermore, as others have mentioned, it could be a signed integer type. Negative char values will convert to negative int values, in the following code.
int var = (int) var2;
The sign bit is considered to be the most significant, so this code doesn't do what you want it to. Perhaps you mean to convert from char to unsigned char (to make it positive), and then to int (by implicit conversion):
int var = (unsigned char) var2;
If you foresee CHAR_BIT exceeding 8 in your use case scenarios, you might want to consider using the modulo operator to reduce it:
int var = (unsigned char) var2 % 256;
But I wonder, if I write the code above, are the remaining highest
significant (32-8=) 24 bits of the int just random or are they set to
0?
Of course an assignment will assign the entire value, not just part of it.
Example: Let var2 be 00001001, if I write the code above, is var then
00000000 00000000 00000000 00001001?
Semantically, yes. The C standard requires that an int be able to store values between the range of -32767 and 32767. Your implementation may choose to represent larger ranges, but that's not required. Technically, an int is at least 16 bits. Just keep that in mind.
For values of var2 that are negative, however (eg. 10000001 in binary notation), the sign bit will be extended. var will end up being 10000000 00000001 (in binary notation).
For casting a char variable named c to an int:
if c is a positive such as 24, then c = 00011000 and after casting it will be fill up by zeros:
00000000 00000000 00000000 00011000
if c is a negative such as -24, then c = 11101000 and after casting it will be fill up by ones:
11111111 11111111 11111111 11101000
Related
In C++ primer it says that "if we assign an out of range value to an object of unsigned type the result is the remainder of the value modulo the number of values the target type can hold."
It gives the example:
int main(){
unsigned char i = -1;
// As per the book the value of i is 255 .
}
Can anybody please explain it to me how this works.
the result is the remainder of the value modulo the number of values the target type can hold
Start with "the number of values the target type can hold". For unsigned char, what is this? The range is from 0 to 255, inclusive, so there are a total of 256 values that can be represented (or "held").
In general, the number of values that can be represented in a particular unsigned integer representation is given by 2n, where n is the number of bits used to store that type.
An unsigned char is an 8-bit type, so 28 == 256, just as we already knew.
Now, we need to perform a modulo operation. In your case of assigning -1 to unsigned char, you would have -1 MOD 256 == 255.
In general, the formula is: x MOD 2n, where x is the value you're attempting to assign and n is the bit width of the type to which you are trying to assign.
More formally, this is laid out in the C++11 language standard (§ 3.9.1/4). It says:
Unsigned integers, declared unsigned, shall obey the laws of arithmetic modulo 2n where n is the number of bits in the value representation of that particular size of integer.*
* This implies that unsigned arithmetic does not overflow because a result that cannot be represented by the resulting unsigned integer type is reduced modulo the number that is one greater than the largest value that can be represented by the resulting unsigned integer type.
Perhaps an easier way to think about modulo arithmetic (and the description that you'll most commonly see used) is that overflow and underflow wrap around. You started with -1, which underflowed the range of an unsigned char (which is 0–255), so it wrapped around to the maximum representable value (which is 255).
It's equivalent in C to C++, though worded differently:
6.3.1.3 Signed and unsigned integers
1 When a value with integer type is converted to another integer type other than _Bool, if the value can be represented by the new type, it is unchanged.
2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented by the new type until the value is in the range of the new type.
3 Otherwise, the new type is signed and the value cannot be represented in it; either the result is implementation-defined or an implementation-defined signal is raised.
The literal 1 is of type int. For this explanation, let's assume that sizeof(int) == 4 as it most probably is. So then 1 in binary would look like this:
00000000 00000000 00000000 00000001
Now let's apply the unary minus operator to get the -1. We're assuming two's complement is used as it most probably is (look up two's complement for more explanation). We get:
11111111 11111111 11111111 11111111
Note that in the above numbers the first bit is the sign bit.
As you try to assign this number to unsigned char, for which holds sizeof(unsigned char) == 1, the value would be truncated to:
11111111
Now if you convert this to decimal, you'll get 255. Here the first bit is not seen as a sign bit, as the type is unsigned.
In Stroustrup's words:
If the destination type is unsigned, the resulting value is simply as many bits from the source as will fit in the destination (high-order bits are thrown away if necessary). More precisely, the result is the least unsigned integer congruent to the source integer modulo 2 to the nth, where n is the number of bits used to represent the unsigned type.
Excerpt from C++ standard N3936:
For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned
integer type: “unsigned char”, “unsigned short int”, “unsigned int”, “unsigned long int”,
and “unsigned long long int”, each of which occupies the same amount of storage and has the same
alignment requirements (3.11) as the corresponding signed integer type47; that is, each signed integer type
has the same object representation as its corresponding unsigned integer type.
I was going through the excerpt from C++ primer myself and I think that I have kind of figured out a way to mathematically figure out how those values come out(feel free to correct me if I'm wrong :) ). Taking example of the particular code below.
unsigned char c = -4489;
std::cout << +c << std::endl; // will yield 119 as its output
So how does this answer of 119 come out?
well take the 4489 and divide it by the total number of characters ie 2^8 = 256 which will give you 137 as remainder.
4489 % 256 = 137.
Now just subtract that 137 from 256.
256 - 137 = 119.
That's how we simply derive the mod value. Do try it for yourself on other values as well. Has worked perfectly accurate for me!
int num = 65537;
char p = (char)num; //char = 1;
Whats going on here?
Is it p=num%(127+128)-1
or p=num%256 or something else?
I need to know why p is equal to 1.
thanks!
since 65537 is 00000000 00000001 00000000 00000001 in binary, but the char type has only 1 byte, the last byte is considered for the char value, wich is 00000001 = 1
Short answer: In practice on standard processors, it is 1 because 65537 % 256 == 1. The reason is the one ksmonkey123 explained.
Note: If you were writing 127 + 128 because the bounds of a signed char, which is equivalent to char on typical compilers nowadays, are -128 to +127, please remember that the number of values between -128 and +127 is (127 - (-128) + 1), which also yields 256, so it does not matter whether you use the bounds of signed char (-128 to 127) or unsigned char (0 to 255).
Nitpick: Actually, assigning a value that can not be represented in an signed destination variable, you get undefined behaviour, and according to the C standard, all bets are off.
Assigning a positive value that does not fit into an unsigned variable yields "mod range" behaviour, like "%256" for unsigned chars if char has 8 bits. Assigning a negative value into an unsigned variable results in one of three possible behaviours defined by the standard. The implementation has to describe which behaviour is used by that implementation. All non-embedded C compilers nowadays behave like adding a multiple of 2^N, where N is the number of bits of the target type, to the value. So "-510" gets to +2 by adding 2*256 to it, and then this +2 is stored in the variable.
#include <stdio.h>
int main(void)
{
int i = 258;
char ch = i;
printf("%d", ch)
}
the output is 2!
How the range of variable works? what is the range of different data types in c langauge?
When assigning to a smaller type the value is
truncated, i.e. 258 % 256 if the new type is unsigned
modified in an implementation-defined fashion if the new type is signed
Otherwise, if the new type is unsigned, the value is converted by
repeatedly adding or subtracting one more than the maximum value that
can be represented in the new type until the value is in the range of
the new type.
Otherwise, the new type is signed and the value cannot be represented
in it; either the result is implementation-defined or an
implementation-defined signal is raised.
So all that fancy "adding or subtracting" means it is assigned as if you said:
ch = i % 256;
char is 8-bit long, while 258 requires nine bits to represent. Converting to char chops off the most significant bit of 258 which is 100000010 in binary, resulting in 10, which is 2 in binary.
When you pass char to printf, it gets promoted to int, which is then picked up by the %d format specifier, and printed as 2.
#include <stdio.h>
int main(void)
{
int i = 258;
char ch = i;
printf("%d", ch)
}
Here i is 0000000100000010 on the machine level. ch takes 1 byte, so it takes last 8 bit, it is 00000010, it is 2.
In order to find out how long various types are in C language you should refer to limits.h (or climits in C++). char is not guaranteed to be 8 bits long . It is just:
smallest addressable unit of the machine that can contain basic character set. It is an integer type. Actual type can be either signed or unsigned depending on implementation
Same sort of vague definitions are put for other types.
Alternatively, you can use operator sizeof to dynamically find out size of the type in bytes.
You may not assume exact ranges of native C data types. Standard places only minimal restrictions, so you can say unsigned short can hold at least 65536 different values. Upper limit can differ
Refer to Wikipedia for more reading
char is on 8 bits so, when you cast (you assign an integer to a char), in 32 bits machine, the i (int is on 32 bits) var is:
00000000 00000000 00000001 00000010 = 258 (in binary)
When you want a char from this int, you truncate the last 8 bits (char is on 8 bits), so you get:
00000010 which mean 2 in decimal, this is why you see this output.
Regards.
This is an overflow ; the result is undefined because char may be signed (undefined behavior) or unsigned (well-defined "wrap-around" behavior).
You are using little-endian machine.
Binary representation of 258 is
00000000 00000000 00000001 00000010
while assigning integer to char, only 8 byte of data is copied to char. i.e LSB.
Here only 00000010 i.e 0x02 will be copied to char.
The same code will gives zero, in case of big-endian machine.
The Standard specifies that hexadecimal constants like 0x8000 (larger than fits in a signed integer) are unsigned (just like octal constants), whereas decimal constants like 32768 are signed long. (The exact types assume a 16-bit integer and a 32-bit long.) However, in regular C environments both will have the same representation, in binary 1000 0000 0000 0000.
Is a situation possible where this difference really produces a different outcome? In other words, is a situation possible where this difference matters at all?
Yes, it can matter. If your processor has a 16-bit int and a 32-bit long type, 32768 has the type long (since 32767 is the largest positive value fitting in a signed 16-bit int), whereas 0x8000 (since it is also considered for unsigned int) still fits in a 16-bit unsigned int.
Now consider the following program:
int main(int argc, char *argv[])
{
volatile long long_dec = ((long)~32768);
volatile long long_hex = ((long)~0x8000);
return 0;
}
When 32768 is considered long, the negation will invert 32 bits,
resulting in a representation 0xFFFF7FFF with type long; the cast is
superfluous.
When 0x8000 is considered unsigned int, the negation will invert
16 bits, resulting in a representation 0x7FFF with type unsigned int;
the cast will then zero-extend to a long value of 0x00007FFF.
Look at H&S5, section 2.7.1 page 24ff.
It is best to augment the constants with U, UL or L as appropriate.
On a 32 bit platform with 64 bit long, a and b in the following code will have different values:
int x = 2;
long a = x * 0x80000000; /* multiplication done in unsigned -> 0 */
long b = x * 2147483648; /* multiplication done in long -> 0x100000000 */
Another examine not yet given: compare (with greater-than or less-than operators) -1 to both 32768 and to 0x8000. Or, for that matter, try comparing each of them for equality with an 'int' variable equal to -32768.
Assuming int is 16 bits and long is 32 bits (which is actually fairly unusual these days; int is more commonly 32 bits):
printf("%ld\n", 32768); // prints "32768"
printf("%ld\n", 0x8000); // has undefined behavior
In most contexts, a numeric expression will be implicitly converted to an appropriate type determined by the context. (That's not always the type you want, though.) This doesn't apply to non-fixed arguments to variadic functions, such as any argument to one of the *printf() functions following the format string.
The difference would be if you were to try and add a value to the 16 bit int it would not be able to do so because it would exceed the bounds of the variable whereas if you were using a 32bit long you could add any number that is less than 2^16 to it.
I have a question that needs guidance from any expert:
As a value with short type is passed as an argument to printf() function, it'll be automatically promoted to int type, that is why the printf() function will see the value as int type instead of short type.
So basically short type is 16-bits wide, which is 0000000000000000 while int type is 32-bits wide, which is 00000000000000000000000000000000.
Let's say I declare a variable call num with short type and initialise it with a value of -32, that means the most significant bits of the short type will be 1, which is 0000000011100000.
When I pass this value to printf(), it'll be converted to int type, so it'll become 00000000000000000000000011100000.
In step 4, when it is converted to int, the most significant bit is 0.
Why, when I use the %hd specifier or even the %d specifier, will it still still prompt me for a negative value instead of a positive?
No, short and int are both signed types, so it is promoted by sign extension not 0-byte padding:
-32 short = 11111111 11100000
-32 int = 11111111 11111111 11111111 11100000
leaving the MSB as 1 i.e. negative.
You could fake the behavour you're expecting by casting it unsigned first, e.g.
printf("%d", (unsigned short)((short)(-32)));
Converting a short to an int basically replicates the most significate bit of the short into the top 16 bits of the int. This is why the int is printed as negative. If you do not want this behaviour using a ushort.
As you say it is converted and conversion in this case implies knowlegde. That is the compiler knows how signed short to int conversion work. It does not just append bits in front, it creates a new int with the same value as the short. That's why you get the correct number.