C bitwise shift - c

I suppose sizeof(char) is one byte. Then when I write following code,
#include<stdio.h>
int main(void)
{
char x = 10;
printf("%d", x<<5);
}
The output is 320
My question is, if char is one byte long and value is 10, it should be:
0000 1010
When I shift by 5, shouldn't it become:
0100 0001
so why is output 320 and not 65?
I am using gcc on Linux and checked that sizeof(char) = 1

In C, all intermediates that are smaller than int are automatically promoted to int.
Therefore, your char is being promoted to larger than 8 bits.
So your 0000 1010 is being shifted up by 5 bits to get 320. (nothing is shifted off the top)
If you want to rotate, you need to do two shifts and a mask:
unsigned char x = 10;
x = (x << 5) | (x >> 3);
x &= 0xff;
printf("%d", x);
It's possible to do it faster using inline assembly or if the compiler supports it, intrinsics.

Mysticial is right. If you do
char x = 10;
printf("%c", x);
It prints "#", which, if you check your ASCII table, is 64.
0000 1010 << 5 = 0001 0100 0000
You had overflow, but since it was promoted to an int, it just printed the number.

Because what you describe is a rotate, not a shift. 0 is always shifted in on left shifts.

Related

Why data overflow in a `char` data type causes wrappping the value within its range?

I recently came across this question and the answer given by #chux - Reinstate Monica.
Quoting lines from their answer, "This is implementation-defined behavior. The assigned value could have been 0 or 1 or 2... Typically, the value is wrapped around ("modded") by adding/subtracting 256 until in range. 100 + 100 -256 --> -56."
Code:
#include <stdio.h>
int main(void)
{
char a = 127;
a++;
printf("%d", a);
return 0;
}
Output: -128
In most of the C compilers, char type takes 1 Byte size and strictly speaking, I'm assuming its 16-bit system and char takes 1 Byte.
When a = 127, its binary representation inside the computer is 0111 1111, increasing it with 1 should yield the value
0111 1111 + 0000 0001 = 1000 0000
which is equal to -0(considering, signed-number representation, where left-most bit represents 0 = + and 1 = -) then why the output is equal to -128?
Is it because of the "INTEGER PROMOTION RULE"? I mean, for this expression a + 1, a gets converted to int (2 Bytes) before the + operation and then its binary representation in the memory becomes 1111 1111 1000 0000 which is equal to -128 and makes sense to the output -128. But then this assumption of mine conflicts with the quoted lines of Chux-Reinstate-Monica about wrapping the values.
1000 0000 which is equal to -0...
Ones' complement has a -0, but most computers use two's complement which does not.
In two's complement notation the left-most bit represents -(coefficient_bit * 2^N-1) i.e. in your case, 1000 0000 the left-most bit represents -(1 * 2^8-1) which is equal to -128 and that's why the output is the same.
Your char is an 8 bit signed integer in which case 1000 0000 is -128. We can test what 1000 0000 is conveniently using the GNU extension which allows binary constants.
char a = 0b10000000;
printf("%d\n", a); // -128
char, in this implementation, is a signed 8-bit integer. Adding 1 to 127 causes integer overflow to -128.
What about integer promotion? Integer promotion happens during the calculation, but the result is still a char. 128 can't fit in our signed 8-bit char, so it overflows to -128.
Integer promotion is demonstrated by this example.
char a = 30, b = 40;
char c = (a * b);
printf("%d\n", c); // -80
char d = (a * b) / 10;
printf("%d\n", d); // 120
char c = (a * b); is -80, but char d = (a * b) / 10; is 120. Why? Shouldn't it be -8? The answer here is integer promotion. The math is done as native integers, but the result must still be stuffed into an 8-bit char. (30 * 40) is 1200 which is 0100 1011 0000. Then it must be stuffed back into an 8 bit signed integer; that's 1011 0000 or -80.
For the other calculation, (30 * 40) / 10 == 1200 / 10 == 120 which fits just fine.

Right rotation of a 16-bit non-negative number?

I'm working on a method in which I need to perform a right rotation. For instance, I have the binary number 0101110111000111, and after the method is performed, the result should be 1010111011100011. A 16-bit non-negative number is passed as the parameter, and this parameter value will have all the bits moved to the right by 1 bit position and with the low-order bit moved to the high-order position (like the example above).
Here is the code I have written. I converted 0101110111000111 to the decimal value of 24007.
#include <stdlib.h>
#include <stdio.h>
unsigned int rotateRight(unsigned int x);
int main(int argc, char **argv) {
unsigned int n = 24007;
printf("%d\n", rotateRight(n, d));
return 0;
}
/*Function to right rotate n by d bits*/
unsigned int rotateRight(unsigned int x) {
return (x >> 1) | (x << (16-1));
}
My expected result should be the value of 44771, because that is the decimal equivalent to 1010111011100011. However, when I run this program, I get 786673379. Could someone explain why this is happening, and how I could improve my rotation function so I can get the correct answer?
(x << (16-1) shifts the entire 16-bit quantity 15 places to the left and prepends it to the x >> 1. Since int can hold a 32-bit value and therefore doesn't truncate your calculation, you get a 31-bit value as a result.
i.e.
x = 0101 1101 1100 0111
x >> 1 = 0010 1110 1110 0011
x << (16 -1) = 0010 1110 1110 0011 1000 0000 0000 0000
=> (x >> 1) | (x << (16-1))
= 0101110111000111010111011100011 (binary)
= 786673379 (decimal)
A solution would be:
unsigned int rotateRight(unsigned int x) {
return ((x >> 1) | (x << (16-1))) & 0xffff;
}
i.e. do the calculation you're already doing, but keep only the lowest 16 bits.
Alternatively you could use a type like uint16_t to ensure that larger numbers are automatically truncated, subject to your feelings about implicit type conversions and explicit type conversion syntax.

Negative numbers: How can I change the sign bit in a signed int to a 0?

I was thinking this world work, but it does not:
int a = -500;
a = a << 1;
a = (unsigned int)a >> 1;
//printf("%d",a) gives me "2147483148"
My thought was that the left-shift would remove the leftmost sign bit, so right-shifting it as an unsigned int would guarantee that it's a logical shift rather than arithmetic. Why is this incorrect?
Also:
int a = -500;
a = a << 1;
//printf("%d",a) gives me "-1000"
TL;DR: the easiest way is to use the abs function from <stdlib.h>. The rest of the answer involves the representation of negative numbers on a computer.
Negative integers are (almost always) represented in 2's complement form. (see note below)
The method of getting the negative of a number is:
Take the binary representation of the whole number (including leading zeroes for the data type, except the MSB which will serve as the sign bit).
Take the 1's complement of the above number.
Add 1 to the 1's complement.
Prefix a sign bit.
Using 500 as an example,
Take the binary representation of 500: _000 0001 1111 0100 (_ is a placeholder for the sign bit).
Take the 1's-complement / inverse of it: _111 1110 0000 1011
Add 1 to the 1's complement: _111 1110 0000 1011 + 1 = _111 1110 0000 1100. This is the same as 2147483148 that you obtained, when you replaced the sign-bit by zero.
Prefix 0 to show a positive number and 1 for a negative number: 1111 1110 0000 1100. (This will be different from 2147483148 above. The reason you got the above value is because you nuked the MSB).
Inverting the sign is a similar process. You get leading ones if you use 16-bit or 32-bit numbers leading to the large value that you see. The LSB should be the same in each case.
Note: there are machines with 1's complement representation, but they are a minority. The 2's complement is usually preferred because 0 has the same representation, i.e., -0 and 0 are represented as all-zeroes in the 2's complement notation.
Left-shifting negative integers invokes undefined behavior, so you can't do that. You could have used your code if you did a = (unsigned int)a << 1;. You'd get 500 = 0xFFFFFE0C, left-shifted 1 = 0xFFFFFC18.
a = (unsigned int)a >> 1; does indeed guarantee logical shift, so you get 0x7FFFFE0C. This is decimal 2147483148.
But this is needlessly complex. The best and most portable way to change the sign bit is simply a = -a. Any other code or method is questionable.
If you however insist on bit-twiddling, you could also do something like
(int32_t)a & ~(1u << 31)
This is portable to 32 bit systems, since (int32_t) guarantees two's complement, but 1u << 31 assumes 32 bit int type.
Demo:
#include <stdio.h>
#include <stdint.h>
int main (void)
{
int a = -500;
a = (unsigned int)a << 1;
a = (unsigned int)a >> 1;
printf("%.8X = %d\n", a, a);
_Static_assert(sizeof(int)>=4, "Int must be at least 32 bits.");
a = -500;
a = (int32_t)a & ~(1u << 31);
printf("%.8X = %d\n", a, a);
return 0;
}
As you put in the your "Also" section, after your first left shift of 1 bit, a DOES reflect -1000 as expected.
The issue is in your cast to unsigned int. As explained above, the negative number is represented as 2's complement, meaning the sign is determined by the left most bit (most significant bit). When cast to an unsigned int, that value no longer represents sign but increases the maximum value your int can take.
Assuming 32 bit ints, the MSB used to represent -2^31 (= -2147483648) and now represents positive 2147483648 in an unsigned int, for an increase of 2* 2147483648 = 4294967296. Add this to your original value of -1000 and you get 4294966296. Right shift divides this by 2 and you arrive at 2147483148.
Hoping this may be helpful: (modified printing func from Print an int in binary representation using C)
void int2bin(int a, char *buffer, int buf_size) {
buffer += (buf_size - 1);
for (int i = buf_size-1; i >= 0; i--) {
*buffer-- = (a & 1) + '0';
a >>= 1;
}
}
int main() {
int test = -500;
int bufSize = sizeof(int)*8 + 1;
char buf[bufSize];
buf[bufSize-1] = '\0';
int2bin(test, buf, bufSize-1);
printf("%i (%u): %s\n", test, (unsigned int)test, buf);
//Prints: -500 (4294966796): 11111111111111111111111000001100
test = test << 1;
int2bin(test, buf, bufSize-1);
printf("%i (%u): %s\n", test, (unsigned int)test, buf);
//Prints: -1000 (4294966296): 11111111111111111111110000011000
test = 500;
int2bin(test, buf, bufSize-1);
printf("%i (%u): %s\n", test, (unsigned int)test, buf);
//Prints: 500 (500): 00000000000000000000000111110100
return 0;
}

MSB (1-bit most left) index in C

How can I get the most significative 1-bit index from an unsigned integer (uint16_t)?
Example:
uint16_t x = // 0000 0000 1111 0000 = 240
printf("ffs=%d", __builtin_ffs(allowed)); // ffs=4
There is a function (__builtin_ffs) that return the least significative 1-bit (LSB) from a unsigned integer.
I want something opposite, I want some function which returns 8 applied to above example.
Remark: I have tried building my own function but I have found some problems with datatype size, which depends by compiler.
From the GCC manual at http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Other-Builtins.html:
Built-in Function: int __builtin_clz (unsigned int x)
Returns the number of leading 0-bits in x, starting at the most significant bit position. If x is 0, the result is undefined.
So, highest set bit:
#define ONE_BASED_INDEX_OF_HIGHEST_SET_BIT(x) \
(CHAR_BIT * sizeof 1 - __builtin_clz(x)) // 1-based index!!
beware of x == 0 or x<0 && sizeof(x)<sizeof 0 though.
if I am reading this right (am I? I'm a little rusty on this stuff) you can do this as follows:
int msb = 0;
while(x) { // while there are still bits
x >>= 1; // right-shift the argument
msb++; // each time we right shift the argument, increment msb
}

C: bit operations on a variable-length bit string

I'm doing some bit operations on a variable-length bit string.
I defined a function setBits(char *res, int x, int y) that should work on that bit string passed by the *res variable, given a x and y (just to mention, I'm trying to implement something like a Bloom filter using 8 bits per x):
void setBits(char *res, int x, int y)
{
*res |= x << (y * 8)
}
E.g. given the following x-y-vectors {0,0} ; {0,1} ; {1,2} ; {2,3}, I expect a bit string like this (or vice-versa depending whether little- or big-endian, but that isn't important right now):
0000 0010 0000 0001 0000 0000 0000 0000
So the lowest 8 bits should come from {0,0}, the second 8 bits from {0,1}, the next 8 bits come from {1,2} and the last from {2,3}.
Unfortunately, and I don't seem to get the reason for that, setBits always returns only the last result (in this case i.e. the bit string from {2,3}). I debugged the code and realized that *res is always 0 - but why? What am I doing wrong? Is it that I chose char* that it doesn't work or am I completely missing something very stupid?
Assuming 8-bit chars, the maximum value you can store in *res is 0xff i.e. (1<<8)-1.
Consider what happens when you call setBits for x=1, y=1
x << (y * 8) == 1 << (1 * 8)
== 1 << 8
== 0x100
*res is an 8-bit value so can only store the bottom 8 bits of this calculation. For any non-zero value of y, the bits which can be stored in *res are guaranteed to be 0.

Resources