In C, how to set first eight bits of any sized int in a generic way - c

How do I set the first (least significant) eight bits of any integer type to all zeroes? Essentially do a bitwise AND of any integer type with 0x00.
What I need is a generic solution that works on any integer size, but not have to create a mask setting all the higher bits to 1.
In other words:
0xffff & 0x00 = 0xff00
0xaabbccddeeffffff & 0x00 = 0xaabbccddeeffff00

With bit shifts:
any_unsigned_integer = any_unsigned_integer >> 8 << 8;

The simplest solution works for all integer types on architectures with 2's complement representation for negative numbers:
val = val & ~0xff;
The reason is ~0xff evaluates to -256 with type int. Let's consider all possible types for val:
if the type of val is smaller than int, val is promoted to int, the mask operation works as expected and the result is converted back to the type of val.
if the type of val is signed, -256 is converted to type of val preserving its value, hence replicating the sign bit, and the mask is performed properly.
If the type of val is unsigned, converting -256 to this type produces the value TYPE_MAX + 1 - 256 that has all bits set except the 8 low bits, again the proper mask for the operation.
Another simple solution, that works for all representations of negative values is this:
val = val ^ (val & 0xff);
It requires storing the value into a variable to avoid multiple evaluation, whereas the first proposal can be applied to any expression with potential side-effects:
return my_function(a, b, c) & ~0xff;

The C not operator ~ will invert all the bits of a given value so, in order to get a mask that will clear only the lower eight bits:
int val = 123456789;
int other_val = val & ~0xff; // AND with binary 1111 ... 1111 0000 0000
val &= ~0xff; // alternative to change original variable.
If you have a wider (or thinner) type, the 0xff should be of the correct type, for example:
long val = 123456789L;
long other_val = val & ~(long)0xff;
val &= ~(long)0xff; // alternative to change original variable.

One way to do it without a creating a mask for the higher bits is to use a combination of the & and ^ operators: x = x ^ (x & 0xFF); (or, using compound assignment: x ^= x & 0xFF;).

Universal solution no mask, any number of bits
#define RESETB(val, nbits) ((val) ^ ((val) & ((1ULL << (nbits)) - 1)))
or even better
#define RESETB(val, nbits) ((val) ^ ((val) & ((nbits) ? ((nbits) >= sizeof(val) * CHAR_BIT ? ((1ULL << (sizeof(val) * CHAR_BIT)) - 1) : ((1ULL << (nbits)) - 1)) : 0)))

Related

Bit Shifting & Manipulation

I'm trying to work with bit manipulation, and am struggling modifying the bits directly.
I have something as follows:
unsigned char myBits = 128; // 10000000 in binary
myBits = myBits >> 1; // Right shift, so we get 64, or 01000000 in binary
Now, how would I use bit manipulation to modify the first bit after the right shift (01000000) to a 1 (11000000)?
Most implementations will shift a "1" bit in from the left if the type in question is signed and the value is negative.
So you could either change the type to signed char, or do some casting on the unsigned types:
myBits = (unsigned char)((signed char)myBits >> 1);
you need to binary OR it with the shifted value:
myBits |= myBits >> 1;
https://godbolt.org/z/dY3eY5dc5
To set the most significant bit (you can change the type to any integer type and it will work):
myBits |= 1ULL << (sizeof(myBits) * CHAR_BIT - 1);

arithmetic right shift shifts in 0s when MSB is 1

As an exercise I have to write the following function:
multiply x by 2, saturating to Tmin / Tmax if overflow, using only bit-wise and bit-shift operations.
Now this is my code:
// xor MSB and 2nd MSB. if diferent, we have an overflow and SHOULD get 0xFFFFFFFF. otherwise we get 0.
int overflowmask = ((x & 0x80000000) ^ ((x & 0x40000000)<<1)) >>31;
// ^ this arithmetic bit shift seems to be wrong
// this gets you Tmin if x < 0 or Tmax if x >= 0
int overflowreplace = ((x>>31)^0x7FFFFFFF);
// if overflow, return x*2, otherwise overflowreplace
return ((x<<1) & ~overflowmask)|(overflowreplace & overflowmask);
now when overflowmask should be 0xFFFFFFFF, it is 1 instead, which means that the arithmetic bit shift >>31 shifted in 0s instead of 1s (MSB got XORed to 1, then shifted to the bottom).
x is signed and the MSB is 1, so according to C99 an arithmetic right shift should fill in 1s. What am I missing?
EDIT: I just guessed that this code isn't correct. To detect an overflow it suffices for the 2nd MSB to be 1.
However, I still wonder why the bit shift filled in 0s.
EDIT:
Example: x = 0xA0000000
x & 0x80000000 = 0x80000000
x & 0x40000000 = 0
XOR => 0x80000000
>>31 => 0x00000001
EDIT:
Solution:
int msb = x & 0x80000000;
int msb2 = (x & 0x40000000) <<1;
int overflowmask = (msb2 | (msb^msb2)) >>31;
int overflowreplace = (x >>31) ^ 0x7FFFFFFF;
return ((x<<1) & ~overflowmask) | (overflowreplace & overflowmask);
Even on twos-complement machines, the behaviour of right-shift (>>) on negative operands is implementation-defined.
A safer approach is to work with unsigned types and explicitly OR-in the MSB.
While you're at it, you probably also want to use fixed-width types (e.g. uint32_t) rather than failing on platforms that don't meet your expectations.
0x80000000 is treated as an unsigned number which causes everything to be converted to unsigned, You can do this:
// xor MSB and 2nd MSB. if diferent, we have an overflow and SHOULD get 0xFFFFFFFF. otherwise we get 0.
int overflowmask = ((x & (0x40000000 << 1)) ^ ((x & 0x40000000)<<1)) >>31;
// this gets you Tmin if x < 0 or Tmax if x >= 0
int overflowreplace = ((x>>31)^0x7FFFFFFF);
// if overflow, return x*2, otherwise overflowreplace
return ((x<<1) & ~overflowmask)|(overflowreplace & overflowmask);
OR write the constants in negative decimals
OR I would store all the constants in const int variables to have them guaranteed signed.
Never use bit-wise operands on signed types. In case of right shift on signed integers, it is up to the compiler if you get an arithmetic or a logical shift.
That's only one of your problems though. When you use a hex integer constant 0x80000000, it is actually of type unsigned int as explained here. This accidentally turns your whole expression (x & 0x80000000) ^ ... into unsigned type because of the integer promotion rule known as "the usual arithmetic conversions". Whereas the 0x40000000 expression is signed int and works as (the specific compiler) expected.
Solution:
All variables involved must be of type uint32_t.
All hex constants involved must be u suffixed.
To get something arithmetic shift portably, you would have to do
(x >> n) | (0xFFFFFFFFu << (32-n)) or some similar hack.

Invert specific bits using bitwise NOT (no XOR)

How can I use Bitwise Not to invert specific bits for x number? I know we can do this use XOR and mask but question requires use NOT.
I need to invert a group of bits starting at a given position. The variable inside the function includes original value, position wants to start and width = number of bits I want to invert.
I use the shift bit to start from a given position but how can I ensure only x number of bits are inverted using NOT Bitwise function?
Definition of xor: a ^ b <--> (a & ~b) | (~a & b)
unsigned x = 0x0F;
unsigned mask = 0x44; // Selected bits to invert
unsigned selected_x_bits_inverted = (x & ~mask) | (~x & mask);
printf("%02X\n", selected_x_bits_inverted);
// 4B
An approach would be:
First, extract them into y:
y = x & mask
Then, invert y and get only the bits you need:
y = ~y & mask
Clear the bits extracted from x:
x = x & (~mask)
OR those 2 numbers to get the result:
x = x | y
Note that every bit that has to be inverted is 1 in mask. Even if I used other bitwise operators, the actual bit flipping is done by a bitwise not. Also, I don't think it is possible to achieve this result without using some other binary operators.
This function will invert 'width' number of bits of number 'num' from position 'pos'
int invert(int num, int pos, int width)
{
int mask = (~((~0) << width)) << pos;
num = (~(num & mask)) & mask);
}

How can I access a specific group of bits from a variable?

I have a variable with "x" number of bits. How can I extract a specific group of bits and then work on them in C?
You would do this with a series of 2 bitwise logical operations.
[[Terminology MSB (msb) is the most-significant-bit; LSB (lsb) is the least-significant-bit. Assume bits are numbered from lsb==0 to some msb (e.g. 31 on a 32-bit machine). The value of the bit position i represents the coefficient of the 2^i component of the integer.]]
For example if you have int x, and you want to extract some range of bits x[msb..lsb] inclusive, for example a 4-bit field x[7..4] out of the x[31..0] bits, then:
By shifting x right by lsb bits, e.g. x >> lsb, you put the lsb bit of x in the 0th (least significant) bit of the expression, which is where it needs to be.
Now you have to mask off any remaining bits above those designated by msb. The number of such bits is msb-lsb + 1. We can form a bit mask string of '1' bits that long with the expression ~(~0 << (msb-lsb+1)). For example ~(~0 << (7-4+1)) == ~0b11111111111111111111111111110000 == 0b1111.
Putting it all together, you can extract the bit vector you want with into a new integer with this expression:
(x >> lsb) & ~(~0 << (msb-lsb+1))
For example,
int x = 0x89ABCDEF;
int msb = 7;
int lsb = 4;
int result = (x >> lsb) & ~(~0 << (msb-lsb+1));
// == 0x89ABCDE & 0xF
// == 0xE (which is x[7..4])
Make sense?
Happy hacking!
If you're dealing with a primitive then just use bitwise operations:
int bits = 0x0030;
bool third_bit = bits & 0x0004; // bits & 00000100
bool fifth_bit = bits & 0x0010; // bits & 00010000
If x can be larger than a trivial primitive but is known at compile-time then you can use std::bitset<> for the task:
#include<bitset>
#include<string>
// ...
std::bitset<512> b(std::string("001"));
b.set(2, true);
std::cout << b[1] << ' ' << b[2] << '\n';
std::bitset<32> bul(0x0010ul);
If x is not known at compile-time then you can use std::vector<unsigned char> and then use bit-manipulation at runtime. It's more work, the intent reads less obvious than with std::bitset and it's slower, but that's arguably your best option for x varying at runtime.
#include<vector>
// ...
std::vector<unsigned char> v(256);
v[2] = 1;
bool eighteenth_bit = v[2] & 0x02; // second bit of third byte
work on bits with &, |. <<, >> operators.
For example, if you have a value of 7 (integer) and you want to zero out the 2nd bit:
7 is 111
(zero-ing 2nd bit AND it with 101 (5 in decimal))
111 & 101 = 101 (5)
here's the code:
#include <stdio.h>
main ()
{
int x=7;
x= x&5;
printf("x: %d",x);
}
You can do with other operators like the OR, shift left, shift right,etc.
You can use bitfields in a union:
typedef union {
unsigned char value;
struct { unsigned b0:1,b1:1,b2:1,b3:1,b4:1,b5:1,b6:1,b7:1; } b;
struct { unsigned b0:2,b1:2,b2:2,b3:2; } b2;
struct { unsigned b0:4,b1:4; } b4;
} CharBits;
CharBits b={0},a={0};
printf("\n%d",b.value);
b.b.b0=1; printf("\n%d",b.value);
b.b.b1=1; printf("\n%d",b.value);
printf("\n%d",a.value);
a.b4.b1=15; printf("\n%d",a.value); /* <- set the highest 4-bit-group with one statement */

Signed right shift = strange result?

I was helping someone with their homework and ran into this strange issue. The problem is to write a function that reverses the order of bytes of a signed integer(That's how the function was specified anyway), and this is the solution I came up with:
int reverse(int x)
{
int reversed = 0;
reversed = (x & (0xFF << 24)) >> 24;
reversed |= (x & (0xFF << 16)) >> 8;
reversed |= (x & (0xFF << 8)) << 8;
reversed |= (x & 0xFF) << 24;
return reversed;
}
If you pass 0xFF000000 to this function, the first assignment will result in 0xFFFFFFFF. I don't really understand what is going on, but I know it has something to do with conversions back and forth between signed and unsigned, or something like that.
If I either append ul to 0xFF it works fine, which I assume is because it's forced to unsigned then converted to signed or something in that direction. The resulting code also changes; without the ul specifier it uses sar(shift arithmetic right), but as unsigned it uses shr as intended.
I would really appreciate it if someone could shed some light on this for me. I'm supposed to know this stuff, and I thought I did, but I'm really not sure what's going on here.
Thanks in advance!
Since x is a signed quantity, the result of (x & (0xFF << 24)) is 0xFF000000 which is also signed and thus a negative number since the top (sign) bit is set. The >> operator on int (a signed value) performs sign extension (Edit: though this behaviour is undefined and implementation-specific) and propagates the sign bit value of 1 as the value is shifted to the right.
You should rewrite the function as follows to work exclusively on unsigned values:
unsigned reverse(unsigned x)
{
unsigned int reversed = 0;
reversed = (x & (0xFF << 24)) >> 24;
reversed |= (x & (0xFF << 16)) >> 8;
reversed |= (x & (0xFF << 8)) << 8;
reversed |= (x & 0xFF) << 24;
return reversed;
}
From your results we can deduce that you are on a 32-bit machine.
(x & (0xFF << 24)) >> 24
In this expression 0xFF is an int, so 0xFF << 24 is also an int, as is x.
When you perform the bitwise & between two int, the result is also an int and in this case the value is 0xFF000000 which on a 32-bit machine means that the sign bit is set, so you have a negative number.
The result of performing a right-shift on an object of signed type with a negative value is implementation-defined. In your case, as sign-preserving arithmetic shift right is performed.
If you right-shift an unsigned type, then you would get the results that you were expecting for a byte reversal function. You could achieve this by making either operand of the bitwise & operand an unsigned type forcing conversion of both operands to the unsigned type. (This is true on any implementation where an signed int can't hold all the possible range of positive values of an unsigned int which is nearly all implementations.)
Right shift on signed types is implementation defined, in particular the compiler is free to do an arithmetic or logical shift as pleases. This is something you will not notice if the concrete value that you are treating is positive, but as soon as it is negative you may fall into a trap.
Just don't do it, this is not portable.
x is signed, so the highest bit is used for the sign. 0xFF000000 means "negative 0x7F000000". When you do the shift, the result is "sign extended": The binary digit that is added on the left to replace the former MSB that was shifted right, is always the same as the sign of value. So
0xFF000000 >> 1 == 0xFF800000
0xFF000000 >> 2 == 0xFFC00000
0xFF000000 >> 3 == 0xFFE00000
0xFF000000 >> 4 == 0xFFF00000
If the value being shifted is unsigned, or if the shift is toward the left, the new bit would be 0. It's only in right-shifts of signed values that sign-extension come into play.
If you want it to work the same on al platforms with both signed and unsigned integers, change
(x & (0xFF << 24)) >> 24
into
(x >> 24) & 0xFF
If this is java code you should use '>>>' which is an unsigned right shift, otherwise it will sign extend the value

Resources