unsigned char rotate - c

I'm a little bit confused as to what an unsigned char is. A signed char is the representation of the char in bit form right? A sample problem has us rotating to the right by n bit positions, the bits of an unsigned char with this solution:
unsigned char rotate(unsigned char x, int n) {
unsigned char temp = x << 8 - n;
x = x >> n;
return (x | temp);
}
If anyone could explain with char examples and their respective bits, it would be greatly appreciated. Thanks so much.

signed char, char and unsigned char are all integer types. For the sake of simplicity I'll assume that CHAR_BIT is 8 and that signed types are 2's complement. So:
signed char is a number from -128 to +127
unsigned char is a number from 0 to 255
char is either the same range as signed char, or the same range as unsigned char, depending on your C implementation.
As far as C is concerned, a character is just a number within the range of the char type (although various character functions like tolower require the value to be cast to an unsigned type on the way in, even if char is signed).
So, signed char and unsigned char are both representation of the character in bit form. For numbers in the range 0 to +127 they both use the same representation (there's only one way to represent positive numbers in binary). For numbers outside that range, the signed representation of a negative number n is the same bits as the unsigned representation of n + 256 (definition of 2's complement).
The reason this code uses unsigned char is that right-shift with a negative signed value has implementation-defined result. Left shift with a negative signed value has undefined behavior. Usually left-shift behaves the same as for unsigned values, which is OK, but right shift inserts bits at the left-hand-side with value 1, a so-called "arithmetic shift", which isn't what's wanted here. Unsigned values always shift in zeros, and it's the shifting in of zero that lets this code build the two parts of the rotated result and or them together.
So, assuming an input value of x = 254 (11111110), and n = 1, we get:
x << 7 is 0111111100000000
x >> 1 is 01111111
| is 0111111101111111
convert to unsigned char to return is 01111111
If we used a signed type instead of unsigned char, we'd quite possibly get:
x is -2 11111110
x << 7 is 11111111111111111111111100000000 (assuming 32-bit int, since
smaller types are always promoted to int for arithmetic ops)
x >> 1 is implementation-defined, possibly
11111111111111111111111111111111
| is 11111111111111111111111111111111
convert to signed char to return is -1
So the bit-manipulation with the unsigned char results in the correct answer, rotated by 1 bit to move the 0 from the end to the start. Bit-manipulation with the signed char, probably gives the wrong result, might give the right result if negative signed values do a logical right shift, but on really unusual implementations could do anything.
Pretty much always for bit-manipulation tasks like rotate, you want to use unsigned types. It removes the implementation-dependence (other than on the width of the type), and avoids you having to reason about negative and non-negative values separately.

Declaring a variable as unsigned char tells the compiler to treat the underlying bit pattern as a number from 0 (00000000) to 255 (11111111). Declaring it a char tells the compiler to apply two's complement to the underlying bit pattern and treat it as a number from -128 (10000000) to 127 (01111111).
Consider a 3-bit number. If it is unsigned, you have:
000 = 0
001 = 1
010 = 2
011 = 3
100 = 4
101 = 5
110 = 6
111 = 7
If it is signed you have:
100 = -4
101 = -3
110 = -2
111 = -1
000 = 0
001 = 1
010 = 2
011 = 3
What is neat with respect to arithmetic (as that link mentions) is that you don't have to treat signed binary numbers differently than unsigned ones. You just do the actual binary math without regard to signed or unsigned. But you do have to apply the signed/unsigned interpretation to the inputs and to the output.
In the signed realm you might have:
2 + (-3) = 010 + 101 = 111 = -1
But in the unsigned realm this is:
2 + 5 = 010 + 101 = 111 = 7
So it's all a matter of interpretation since the actual bit patters being added and the bit pattern of the sum are the same in both cases.

an unsigned char is just an 8-bit integer type that can take values between 0 and 255 and a signed char can take values between -127 and 128. In the actual machine code there is no real difference, except one: when you do a right shift on a signed type using >> (be it char, short or int) it is carried out as an arithmetical shift, meaning for negative values (which have a 1 as MSB) a 1 is shifted in, instead of a 0 and the above code will not work as expected.
EDIT: Your above code example of rotating an unsigned char by 3 bits for signed and unsigned:
00110101 rotated unsigned and signed is 10100110.
but for a number whit a 1 in front you get an arithmetic shift and thus
11010001 rotated unsigned is 00111010.
11010001 rotated signed is 11111010.

Related

Why unsigned is treated as signed?

I know there was very similar question already answered but I believe it doesn't address my problem.
unsigned char aaa = -10;
unsigned int bbb = (unsigned int)-5;
unsigned int ccc = (unsigned int)20 + (unsigned int)bbb;
printf("%d\n", aaa);
printf("%d\n", ccc);
Above code prints aaa = 246 (which is what I would expect) but ccc = 15 which means unsigned int was treated all the way as signed. I can't find explanation for this even trying probably obsolete typecasting.
unsigned char aaa = -10;
The int value -10 will be converted to unsigned char by repeatedly adding UCHAR_MAX + 1 until it the result will in range of [0, UCHAR_MAX]. Most probablly the char has 8-bits on your system, that means UCHAR_MAX = 2**8 - 1 = 255. So -10 + UCHAR_MAX + 1 is 246, it's in range. So aaa = 246.
unsigned int bbb = (unsigned int)-5;
The -5 is added UINT_MAX + 1, assuming int has 32bits, it results in bbb = 4294967291.
unsigned int ccc = (unsigned int)20 + (unsigned int)bbb;
Unsigned integer overflow "wrap around". So 20 + 4294967291 = 4294967311 is greater then UINT_MAX = 2**32 - 1 = 4294967295. So we subtract UINT_MAX+1 until we will be in range [0, UINT_MAX]. So 4294967311 - (UINT_MAX+1) = 15.
printf("%d\n", aaa);
The code is most probably fine. Most probably on your platform unsigned char is promoted to int before passing into variadic function argument. For reference about promotions you could read cppreference implicit conversions. Because %d expects an int and unsigned char is promoted to int, the code is fine. [1]
printf("%d\n", ccc);
This line results in undefined behavior. ccc has the type unsigned int, while %d printf format specifier expects an signed int. Because your platform uses two-s complement to represent numbers, this just results in printf interpreting the bits as a signed value, which is 15 anyway.
[1]: There is a theoretical possibility of unsigned char having as many bits as int, so unsigned char will get promoted to unsigned int instead of int, which will result in undefined behavior there too.
C variable types
C is a quite low-level programming language, and it does not preserve variable types after compilation. For example, if one had both a unsigned and an unsigned variables of equal size (say uint64_t and int64_t), after the compilation ends, each of them will be represented as just an 8-byte piece of memory. All the addition/subtraction will be performed modulo 2 in the corresponding power (64 for 64-bit variables and 32 for 32-bit ones).
The only difference, which remains after compilation is in the comparison. For example, (unsigned) -5 > 1, but -5 < 1. I will explain why now
Your -5 will be stored modulo 2^32, just like every 32-bit value. -5 will be presented as 0xfffffffb in actual ram. The algorithm is simple: if a variable is signed, then it's first bit is called a sign bit and indicates, whether it's positive or negative. The first bit of 0xfffffffb is 1, so, when it comes to signed comparison, it is negative, and is less, than 1. But when compared as an unsigned integer, this value is actually a huge one, 2^32 - 5. So, in general, unsigned representation of a negative signed number is greater by 2^[num of bits in that number]. You can read more about this binary arithmetics here.
So, all, that happened, you got an unsigned number, equal to 0xfffffffb + 0x14 modulo 2^32, which is 0x10000000f (mod 2^32), and that is 0xf = 15.
In conclusion, "%d" assumes it's argument is signed. But that's not the main reason, why the answer happened. However, I advice using %u for unsigned numbers.

What is the reasoning behind char to int conversion output?

For example:
int x = 65535; char y = x; printf("%d\n", y)
This will output -1. Anyway to derive this by hand?
In order to derive this by hand you need to know several implementation-defined aspects of your system - namely
If char is signed or not
If the char is signed, what representation scheme is used
How does your system treat narrowing conversions of values that cannot be represented exactly in the narrow type.
Although the standard allows implementations to decide, a very common approach to narrowing conversions is to truncate the bits that do not fit in a narrow type. Assuming that this is the approach taken by your system, the first part of figuring out the output is to find the last eight bits of the int value being converted. In your case, 65535 is 11111111111111112, so the last eight bits are all ones.
Now you need to decide the interpretation of 111111112. On your system char is signed, and the system uses two's complement representation of negative values, so this pattern is interpreted as an eight-bit -1 value.
When you call printf, eight-bit value of signed char is promoted to int, so the preserved value is printed.
On systems where char is unsigned by default the same pattern would be interpreted as 255.
65535 is 0xffff
When converting to char, the left bits are left off:
0xffff AND 0xff is 0xff
When passing a char to a function, it is expanded to int. As the left-most bit of the char is a 1, this will be sign-extended, so it will become 0xffffffff (32 bits)
This is -1, so -1 is printed as.
As Dasblinkenlight points out, it matters if the char is signed or unsigned (whether as a default in the implementation or as the declaration unsigned char y). My last lines would read for unsigned char:
When passing an unsgined char to a function, it is expanded to unsigned int. As it is unsigned, just zeroes are added at the left, so it will become 0x000000ff (32 bits).
This is 255, so 255 is printed as.
65535 when converted to binary is 1111111111111111.
and when you assign this to a character variable it gets trimmed to least significant 8 bits which is 11111111.
This is equivalent to -1 in Binary. 2's complement of any number gives its negative value, hence 2's complement of 00000001:
11111110 + 1 = 11111111 is -1.
int x = 65535;
The value of 'x' 65535 is equivalent to 0xffff (binary = 1111 1111 1111 1111) in hexadecimal which represent the 2 Byte(16 bit) number .
Then the value of 'x' is converting into character 'y' and the size of character datatype is 1 Byte (8 bit) .so right side values of x is cutoff while assigning to 'y' because it only have 1B memory space .
So the value of 'y' is 0xff (binary = 1111 1111) which is equal to -1 in signed number integer representation .
And we are displaying the 'y' character value in term of integer by '%d' .

Difference between unsigned int and int

I read about twos complement on wikipedia and on stack overflow, this is what I understood but I'm not sure if it's correct
signed int
the left most bit is interpreted as -231 and this how we can have negative numbers
unsigned int
the left most bit is interpreted as +231 and this is how we achieve large positive numbers
update
What will the compiler see when we store 3 vs -3?
I thought 3 is always 00000000000000000000000000000011
and -3 is always 11111111111111111111111111111101
example for 3 vs -3 in C:
unsigned int x = -3;
int y = 3;
printf("%d %d\n", x, y); // -3 3
printf("%u %u\n", x, y); // 4294967293 3
printf("%x %x\n", x, y); // fffffffd 3
Two's complement is a way to represent negative integers in binary.
First of all, here's a standard 32-bit integer ranges:
Signed = -(2 ^ 31) to ((2 ^ 31) - 1)
Unsigned = 0 to ((2 ^ 32) - 1)
In two's complement, a negative is represented by inverting the bits of its positive equivalent and adding 1:
10 which is 00001010 becomes -10 which is 11110110 (if the numbers were 8-bit integers).
Also, the binary representation is only important if you plan on using bitwise operators.
If your doing basic arithmetic, then this is unimportant.
The only time this may give unexpected results outside of the aforementioned times is getting the absolute value of the signed version of -(2 << 31) which will always give a negative.
Your problem does not have to do with the representation, but the type.
A negative number in an unsigned integer is represented the same, the difference is that it becomes a super high number since it must be positive and the sign bit works as normal.
You should also realize that ((2^32) - 5) is the exact same thing as -5 if the value is unsigned, etc.
Therefore, the following holds true:
unsigned int x = (2 << 31) - 5;
unsigned int y = -5;
if (x == y) {
printf("Negative values wrap around in unsigned integers on underflow.");
}
else {
printf( "Unsigned integer underflow is undefined!" );
}
The numbers don't change, just the interpretation of the numbers. For most two's complement processors, add and subtract do the same math, but set a carry / borrow status assuming the numbers are unsigned, and an overflow status assuming the number are signed. For multiply and divide, the result may be different between signed and unsigned numbers (if one or both numbers are negative), so there are separate signed and unsigned versions of multiply and divide.
For 32-bit integers, for both signed and unsigned numbers, n-th bit is always interpreted as +2n.
For signed numbers with the 31th bit set, the result is adjusted by -232.
Example:
1111 1111 1111 1111 1111 1111 1111 11112 as unsigned int is interpreted as 231+230+...+21+20. The interpretation of this as a signed int would be the same MINUS 232, i.e. 231+230+...+21+20-232 = -1.
(Well, it can be said that for signed numbers with the 31th bit set, this bit is interpreted as -231 instead of +231, like you said in the question. I find this way a little less clear.)
Your representation of 3 and -3 is correct: 3 = 0x00000003, -3 + 232 = 0xFFFFFFFD.
Yes, you are correct, allow me to explain a bit further for clarification purposes.
The difference between int and unsigned int is how the bits are interpreted. The machine processes unsigned and signed bits the same way, but there are extra bits added for signing. Two's complement notation is very readable when dealing with related subjects.
Example:
The number 5's, 0101, inverse is 1011.
In C++, it's depends when you should use each data type. You should use unsigned values when functions or operators return those values. ALUs handle signed and unsigned variables very similarly.
The exact rules for writing in Two's complement is as follows:
If the number is positive, count up to 2^(32-1) -1
If it is 0, use all zeroes
For negatives, flip and switch all the 1's and 0's.
Example 2(The beauty of Two's complement):
-2 + 2 = 0 is displayed as 0010 + 1110; and that is 10000. With overflow at the end, we have our result as 0000;

How does C store negative numbers in signed vs unsigned integers?

Here is the example:
#include <stdio.h>
int main()
{
int x=35;
int y=-35;
unsigned int z=35;
unsigned int p=-35;
signed int q=-35;
printf("Int(35d)=%d\n\
Int(-35d)=%d\n\
UInt(35u)=%u\n\
UInt(-35u)=%u\n\
UInt(-35d)=%d\n\
SInt(-35u)=%u\n",x,y,z,p,p,q);
return 0;
}
Output:
Int(35d)=35
Int(-35d)=-35
UInt(35u)=35
UInt(-35u)=4294967261
UInt(-35d)=-35
SInt(-35u)=4294967261
Does it really matter if I declare the value as signed or unsigned int? Because, C actually only cares about how I read the value from memory. Please help me understand this and I hope you prove me wrong.
Representation of signed integers is up to the underlying platform, not the C language itself. The language definition is mostly agnostic with regard to signed integer representations. Two's complement is probably the most common, but there are other representations such as one's complement and signed magnitude.
In a two's complement system, you negate a value by inverting the bits and adding 1. To get from 5 to -5, you'd do:
5 == 0101 => 1010 + 1 == 1011 == -5
To go from -5 back to 5, you follow the same procedure:
-5 == 1011 => 0100 + 1 == 0101 == 5
Does it really matter if I declare the value as signed or unsigned int?
Yes, for the following reasons:
It affects the values you can represent: unsigned integers can represent values from 0 to 2N-1, whereas signed integers can represent values between -2N-1 and 2N-1-1 (two's complement).
Overflow is well-defined for unsigned integers; UINT_MAX + 1 will "wrap" back to 0. Overflow is not well-defined for signed integers, and INT_MAX + 1 may "wrap" to INT_MIN, or it may not.
Because of 1 and 2, it affects arithmetic results, especially if you mix signed and unsigned variables in the same expression (in which case the result may not be well defined if there's an overflow).
An unsigned int and a signed int take up the same number of bytes in memory. They can store the same byte values. However the data will be treated differently depending on if it's signed or unsigned.
See http://en.wikipedia.org/wiki/Two%27s_complement for an explanation of the most common way to represent integer values.
Since you can typecast in C you can effectively force the compiler to treat an unsigned int as signed int and vice versa, but beware that it doesn't mean it will do what you think or that the representation will be correct. (Overflowing a signed integer invokes undefined behaviour in C).
(As pointed out in comments, there are other ways to represent integers than two's complement, however two's complement is the most common way on desktop machines.)
Does it really matter if I declare the value as signed or unsigned int?
Yes.
For example, have a look at
#include <stdio.h>
int main()
{
int a = -4;
int b = -3;
unsigned int c = -4;
unsigned int d = -3;
printf("%f\n%f\n%f\n%f\n", 1.0 * a/b, 1.0 * c/d, 1.0*a/d, 1.*c/b);
}
and its output
1.333333
1.000000
-0.000000
-1431655764.000000
which clearly shows that it makes a huge difference if I have the same byte representation interpreted as signed or unsigned.
#include <stdio.h>
int main(){
int x = 35, y = -35;
unsigned int z = 35, p = -35;
signed int q = -35;
printf("x=%d\tx=%u\ty=%d\ty=%u\tz=%d\tz=%u\tp=%d\tp=%u\tq=%d\tq=%u\t",x,x,y,y,z,z,p,p,q,q);
}
the result is:
x=35 x=35 y=-35 y=4294967261 z=35 z=35 p=-35 p=4294967261 q=-35 q=4294967261
the int number store is not different, it stored with Complement style in memory,
I can use 0X... the 35 in 0X00000023, and the -35 in 0Xffffffdd, it is not different you use sigend or unsigend. it only output with different sytle. The %d and %u is not different about positive, but the negative the first position is sign, if you output with %u is 0Xffffffdd equal 4294967261, but the %d the 0Xffffffdd can be - 0X00000023 equal -35.
The most fundamental thing that variable's type defines is the way it is stored (that is - read from and written to) in memory and how are the bits interpreted, so your statement can be considered "valid".
You can also look at the problem using conversions. When you store signed and negative value in unsigned variable it gets converted to unsigned. It so happens that this conversion is reversible, so signed -35 converts to unsigned 4294967261, which - when you request it - can be converted to signed -35. That's how 2's complement encoding (see link in other answer) works.

Left shift operator

If I have the following:
char v = 32; // 0010 0000
then I do:
v << 2
the number becames negative. // 1000 0000 -128
I read the standard but it is only written:
If E1 has a signed type and nonnegative value, and E1 × 2 E2 is
representable in the result type, then that is the resulting value;
otherwise, the behavior is undefined.
so I don't understand if is a rule that if a bit goes on most left bit the
number must begin negative.
I'm using GCC.
Left shifting it twice would give 1000 0000)2 = 128)10.
If 128 is representable in char i.e. you're in some machine (with a supporting compiler) that provides a char of size > 8 bits then 128 would be the value you get (since it's representable in such a type).
Otherwise, if the size of a char is just 8 bits like most common machines, for a signed character type that uses two's complement for negative values, [-128, 127] is the representable range. You're in undefined behaviour land since it's not representable as-is in that type.
Signed data primitives like char use two's complement(http://en.wikipedia.org/wiki/Twos_complement) to encode value. You probably are looking for is unsigned char which won't encode the value using two's complement(no negatives).
Try using unsigned char instead char uses less bit for representing your character, by using unsigned char you avail more bits for representing your character
unsigned char var=32;
v=var<<2;

Resources