What is going on in this code? From the name and the context it's finding the number of cores on the machine, but how does it work? What's all that bit fiddling for?
static int32
getproccount(void)
{
uintptr buf[16], t;
int32 r, cnt, i;
cnt = 0;
r = runtime·sched_getaffinity(0, sizeof(buf), buf);
if(r > 0)
for(i = 0; i < r/sizeof(buf[0]); i++) {
t = buf[i];
t = t - ((t >> 1) & 0x5555555555555555ULL);
t = (t & 0x3333333333333333ULL) + ((t >> 2) & 0x3333333333333333ULL);
cnt += (int32)((((t + (t >> 4)) & 0xF0F0F0F0F0F0F0FULL) * 0x101010101010101ULL) >> 56);
}
return cnt ? cnt : 1;
}
Note: ignore the · in runtime·sched_getaffinity, think of that line as just an arbitrary library/system call that does what the name and arguments imply. (In this case this specific call comes from the old pre-Go1.4 runtime written in a slight variation of C that deals with ·).
The for loop runs over as many elements that get filled into the buf array. For each of those elements, it calculates the number of bits that are set in that element (the bit fiddling with t does just that) and that is added into cnt. At the end cnt is returned (or 1 if cnt is 0).
Explanation for the bit fiddling:
Bit fiddling line 1
The line t = t - ((t >> 1) & 0x5555555555555555ULL); basically groups off the bits of t into 2 bits and replaces each group with the count of the number of set bits in that group. This works like follows:
Consider t = ... w x y z where w,x,y,z are individual bits. Then
t = ... w x y z
t>>1 = ..... w x y
t>>1 & 0x...55ULL = ... 0 w 0 y
From above, it is clear to see why the grouping happens into 2 bits (for example, y and z get grouped together here). Now t - ((t >> 1) & 0x5555555555555555ULL) will have each group of 2 bits y z transformed to (y-z). Using a table of the 4 possibilities (00, 01, 10, 11), we see that the answers are (00, 01, 01, 10) which matches with the number of bits set in that group of 2 bits.
Bit fiddling line 2
Moving on to the next bit fiddling line t = (t & 0x3333333333333333ULL) + ((t >> 2) & 0x3333333333333333ULL);, we can see that it is adding up adjacent groups of 2 in groups of 2.
t & 0x..33ULL takes the rightmost 2 bits of each group of 4 bits.
(t>>2) & 0x..33ULL takes the leftmost 2 bits of each group of 4 bits and shifts them right by 2.
Since these two groups of 2 bits were the number of bits set in the original number, it has added up the number of bits set in every group of 4 bits. (i.e. now, each group of 4 bits has the number of bits that were set originally in those positions)
Bit fiddling line 3
As for the last bit fiddling line cnt += (int32)((((t + (t >> 4)) & 0xF0F0F0F0F0F0F0FULL) * 0x101010101010101ULL) >> 56);, we can break it down into a few parts for easier understanding.
(int32)(
(
(
(
t + (t >> 4)
) & 0xF0F0F0F0F0F0F0FULL
) * 0x101010101010101ULL
) >> 56
)
Currently, each group of 4 bits stores the number of bits that were set originally in the number. Shifting the number over by 4 and adding would add all adjecent groups together. &ing with 0x..0F0FULL picks out the right 4 bits of each group of 8 bits. Hence, at the end of (t + (t >> 4)) & 0x...0F0FULL, there are groups of 8 bits which contain the number of bits that were there in those locations in the original number. Let's just call this number s = (t + (t >> 4)) & 0x...0F0FULL for simplicity.
We now have to do (s * 0x...0101ULL) >> 56. Notice that t and thus s are 64 bits in size. This means that there are 8 groups of 8 bits. By simple multiplication with 0x...0101ULL (which is just the rightmost bit turned on for each group), the leftmost group will now be the sum of all the groups. By shifting right by 56 (i.e. (64 - 8)), we move this leftmost group into the rigthmost position. This means that the rightmost group of 8 bits now has the sum of all the groups of s. But s's groups were the number of set bits in those locations in the number, therefore, after the >>56 operation, we have the total number of set bits in the original number. The (int32) is just typecasting to a lower size datatype (which is actually enough to store this number.
Note: A bit being set means that the bit is equal to 1.
Related
Edit: I wish SO let me accept 2 answers because neither is complete without the other. I suggest reading both!
I am trying to come up with a fast implementation of a function that given an unsigned 32-bit integer x returns the sum of 2^trailing_zeros(i) for i=1..x-1, where trailing_zeros is the count trailing zeros operation which is defined as returning the 0 bits after the least significant 1 bit. This seems like the kind of problem that should lend itself to a clever bit manipulation implementation that takes the same number of instructions regardless of the input, but I haven't been able to derive it.
Mathematically, 2^trailing_zeros(i) is equivalent to the largest factor of 2 that exactly divides i. So we are summing those largest factors for 1..x-1.
i | 1 2 3 4 5 6 7 8 9 10
-----------------------------------------------------------------------
2^trailing_zeroes(i) | 1 2 1 4 1 2 1 8 1 2
-----------------------------------------------------------------------
Sum (desired value) | 0 1 3 4 8 9 11 12 20 21
It is a little easier to see the structure of 2^trailing_zeroes(i) if we 'plot' the values -- horizontal position increasing from left to right corresponding to i and vertical position increasing from top to bottom corresponding to trailing_zeroes(i).
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 4
8 8 8 8 8 8 8 8 8 8 8 8 8 8 8 8
16 16 16 16 16 16 16 16
32 32 32 32
64 64
Here it is easier to see the pattern that 2's are always 4 apart, 8's are always 16 apart, etc. However, each pattern starts at a different time -- 8's don't begin until i=8, 16 doesn't begin until i=16, etc. If you don't take into account that the patterns don't start right away you can come up with formulas that don't work -- for example you might think to determine the number of 8's going into the total you should just compute floor(x/16) but i=25 is far enough to the right to include both of the first two 8s.
The best solution I have come up with so far is:
Set n = floor(log2(x)). This can be computed quickly using the count leading zeros operation. This tells us the highest power of two that is going to be involved in the sum.
Set sum = 0
for i = 1..n
sum += floor((x - 2^i) / 2^(i+1))*2^i + 2^i
The way this works as for each power, it calculates the horizontal distance on the plot between x and the first appearance of that power, e.g. the distance between x and the first 8 is (x-8), and then it divides by the distance between repeating instances of that power, e.g. floor((x-8)/16), which gives us how many times that power appeared, we the sum for that power, e.g. floor((x-8)/16)*8. Then we add one instance of the given power because that calculation excludes the very first time that power appears.
In practice this implementation should be pretty fast because the division/floor can be done by right bit shift and powers of two can be done with 1 bit-shifted to the left. However it seems like it should still be possible to do better. This implementation will loop more for larger inputs, up to 32 times (it's O(log2(n)), ideally we want O(1) without a gigantic lookup table using up all the CPU cache). I've been eyeing the BMI/BMI2 intrinsics but I don't see an obvious way to apply them.
Although my goal is to implement this in a compiled language like C++ or Rust with real bit shifting and intrinsics, I've been prototyping in Python. Included below is my script that includes the implementation I described, z(x), and the code for generating the plot, tower(x).
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from math import pow, floor, log, ceil
def leading_zeros(x):
return len(bin(x).split('b')[-1].split('1')[-1])
def f(x):
s = 0
for c, i in enumerate(range(1,x)):
a = pow(2, len(bin(i).split('b')[-1].split('1')[-1]))
s += a
return s
def g(x): return sum([pow(2,i)*floor((x+pow(2,i)-1)/pow(2,i+1)) for i in range(0,32)])
def h(x):
s = 0
extra = 0
extra_s = 0
for i in range(0,32):
num = (x+pow(2,i)-1)
den = pow(2,i+1)
fraction = num/den
floored = floor(num/den)
power = pow(2,i)
product = power*floored
if product == 0:
break
s += product
extra += (fraction - floored)
extra_s += power*fraction
#print(f"i={i} s={s} num={num} den={den} fraction={fraction} floored={floored} power={power} product={product} extra={extra} extra_s={extra_s}")
return s
def z(x):
upper_bound = floor(log(x,2)) if x > 0 else 0
s = 0
for i in range(upper_bound+1):
num = (x - pow(2,i))
den = pow(2,i+1)
fraction = num/den
floored = floor(fraction)
added = pow(2,i)
s += floored * added
s += added
print(f"i={i} s={s} upper_bound={upper_bound} num={num} den={den} floored={floored} added={added}")
return s
# return sum([floor((x - pow(2,i))/pow(2,i+1) + pow(2,i)) for i in range(floor(log(x, 2)))])
def tower(x):
table = [[" " for i in range(x)] for j in range(ceil(log(x,2)))]
for i in range(1,x):
p = leading_zeros(i)
table[p][i] = 2**p
for row in table:
for col in row:
print(col,end='')
print()
# h(9000)
for i in range(1,16):
tower(i)
print((i, f(i), g(i), h(i), z(i-1)))
Based on the method of Eric Postpischil, here is a way to do it without a loop.
Note that every bit is being multiplied by its position, and the results are summed (sort of, except there is also a factor of 0.5 in it, let's put that aside for now). Let's call those values that are being added up "the partial products" just to call them something, it's not really accurate to call them that, I can't come up with anything better. If we transpose that a little bit, then it's built up like this: the lowest bit of every partial product is the lowest bit of the position of every bit multiplied by that bit. Single-bit-products are bitwise-AND, and the values of the lowest bits of the positions are 0,1,0,1 etc, so it works out to x & 0xAAAAAAAA, the second bit of every partial product is x & 0xCCCCCCCC (and has a "weight" of 2, so this must be multiplied by 2) etc.
Then the whole thing needs to be shifted right by 1, to account for the factor of 0.5
So in total:
unsigned CountCumulativeTrailingZeros(unsigned x)
{
--x;
unsigned sum = x;
sum += (x >> 1) & 0x55555555;
sum += x & 0xCCCCCCCC;
sum += (x & 0xF0F0F0F0) << 1;
sum += (x & 0xFF00FF00) << 2;
sum += (x & 0xFFFF0000) << 3;
return sum;
}
For an additional explanation, here is a more visual example. Let's temporarily drop the factor of 0.5 again, it doesn't fundamentally change the algorithm but adds some complication.
First I write above every bit of v (some example value), the position of that bit in binary (p0 is the least significant bit of the position, p1 the second bit etc). Read the ps vertically, every column is a number:
p0: 10101010101010101010101010101010
p1: 11001100110011001100110011001100
p2: 11110000111100001111000011110000
p3: 11111111000000001111111100000000
p4: 11111111111111110000000000000000
v : 00000000100001000000001000000000
So for example bit 9 is set, and it has (reading from bottom to top) 01001 above it (9 in binary).
What we want to do (why this works has been explained by Eric's answer), is take the indexes of the bits that are set, shift them to their corresponding positions, and add them. In this case, they are already at their own positions (by construction, the numbers were written at their own positions), so there is no shift, but they still need to be filtered so only the numbers that correspond to set bits survive. This is what I meant by the "single bit products": take a bit of v and multiply it by the corresponding bits of p0, p1, etc.
You can look at that as multiplying the bit value by its index as well so 2^bit * bit as mentioned in the comments. That is not how it is done here, but that is effectively what is done.
Back to the example, applying bitwise-AND results in these partial products:
pp0: 00000000100000000000001000000000
pp1: 00000000100001000000000000000000
pp2: 00000000100000000000000000000000
pp3: 00000000000000000000001000000000
pp4: 00000000100001000000000000000000
v : 00000000100001000000001000000000
The only values that are left are 01001, 10010, 10111, and they are at their corresponding positions (so, already shifted to where they need to go).
Those values must be added, while keeping them at their positions. They don't need to be extracted from the strange form which they are in, addition is freely reorderable (associative and commutative) so it's OK to add all the least significant bits of the partial products to the sum first, then all the seconds bits, and so on. But they have to added with the right "weight", after all a set bit in pp0 corresponds to a 1 at that position but a set bit in pp1 really corresponds to a 2 at that position (since it's the second bit of the number that it is part of). So pp0 is used directly, but pp1 is shifted left by 1, pp2 is shifted left by 2, etc.
The the factor of 0.5 must still be accounted for, which I did mostly by shifting over the bits of the partial products by one less than what their weight would imply. pp0 was shifted left by zero, so it must be shifted right by 1 now. This could be done with less complication by just putting return sum >> 1; at the end, but that would reduce the range of values that the function can handle before running into integer wrapping modulo 232 (also it would cost an extra operation, and doing it the weird way does not).
Observe that if we count from 1 to x instead of to x−1, we have a pattern:
x
sum
sum/x
1
1
1
2
3
1.5
4
8
2
8
20
2.5
16
48
3
So we can easily calculate the sum for any power of two p as p • (1 + ½b), where b is the power (equivalently, the number of the bit that is set or the log2 of the power). We can see this by induction: If the sum from 1 to 2b is 2b•(1+½b) (which it is for b=0), then the sum from 1 to 2b+1 reprises the individual term contributions twice except that the last term adds 2b+1 instead of 2b, so the sum is 2•2b•(1+½b) − 2b + 2b+1 = 2b+1•(1+½b) + ½•2b+1 = 2b+1•(1+½(b+1)).
Further, between any two powers of two, the lower bits reprise the previous partial sums. Thus, for any x, we can compute the cumulative number of trailing zeros by summing the sums for the set bits in it. Recalling this provides the sum for numbers from 1 to x, we adjust by to get the desired sum from 1 to x−1 subtracting one from x before computation:
unsigned CountCumulative(unsigned x)
{
--x;
unsigned sum = 0;
for (unsigned bit = 0; bit < sizeof x * CHAR_BIT; ++bit)
sum += (x & 1u << bit) * (1 + bit * .5);
return sum;
}
We can terminate the loop when x is exhausted:
unsigned CountCumulative(unsigned x)
{
--x;
unsigned sum = 0;
for (unsigned bit = 0; x; ++bit, x >>= 1)
sum += ((x & 1) << bit) * (1 + bit * .5);
return sum;
}
As harold points out, we can factor out the 1, as summing the value of each bit of x equals x:
unsigned CountCumulative(unsigned x)
{
--x;
unsigned sum = x;
for (unsigned bit = 0; x; ++bit, x >>= 1)
sum += ((x & 1) << bit) * bit * .5;
return sum;
}
Then eliminate the floating-point:
unsigned CountCumulative(unsigned x)
{
unsigned sum = --x;
for (unsigned bit = 0; x; ++bit, x >>= 1)
sum += ((x & 1) << bit) / 2 * bit;
return sum;
}
Note that when bit is zero, ((x & 1) << bit) / 2 will lose the fraction, but this irrelevant as * bit makes the contribution zero anyway. For all other values of bit, (x & 1) << bit is even, so the division does not lose anything.
This will overflow unsigned at some point, so one might want to use a wider type for the calculations.
More Code Golf
Another way to add half the values of the bits of x repeatedly depending on their bit position is to shift x (to halve its bit values) and then add that repeatedly while removing successive bits from low to high:
unsigned CountCumulative(unsigned x)
{
unsigned sum = --x;
for (unsigned bit = 0; x >>= 1; ++bit)
sum += x << bit;
return sum;
}
I don't understand the exercise 2-9, in K&R C programming language,
chapter 2, 2.10:
Exercise 2-9. In a two's complement number system, x &= (x-1) deletes the rightmost 1-bit in x . Explain why. Use this observation to write a faster version of bitcount .
the bitcount function is:
/* bitcount: count 1 bits in x */
int bitcount(unsigned x)
{
int b;
for (b = 0; x != 0; x >>= 1)
if (x & 01)
b++;
return b;
}
The function deletes the rightmost bit after checking if it is bit-1 and then pops in the last bit .
I can't understand why x&(x-1) deletes the right most 1-bit?
For example, suppose x is 1010 and x-1 is 1001 in binary, and x&(x-1) would be 1011, so the rightmost bit would be there and would be one, where am I wrong?
Also, the exercise mentioned two's complement, does it have something to do with this question?
Thanks a lot!!!
First, you need to believe that K&R are correct.
Second, you may have some mis-understanding on the words.
Let me clarify it again for you. The rightmost 1-bit does not mean the right most bit, but the right most bit which is 1 in the binary form.
Let's arbitrary assume that x is xxxxxxx1000(x can be 0 or 1). Then from right to left, the fourth bit is the "rightmost 1-bit". On the basis of this understanding, let's continue on the problem.
Why x &=(x-1) can delete the rightmost 1-bit?
In a two's complement number system, -1 is represented with all 1 bit-pattern.
So x-1 is actually x+(-1), which is xxxxxxx1000+11111111111. Here comes the tricky point.
before the righmost 1-bit, all 0 becomes 1 and the rightmost 1-bit becomes 0 and there is a carry 1 go to left side. And this 1 will continue to proceed to the left most and cause an overflow, meanwhile, all 'x' bit is still a because 'x'+'1'+'1'(carry) causes a 'x' bit.
Then x & (x-1) will delete the rightmost 1-bit.
Hope you can understand it now.
Thanks.
Here is a simple way to explain it. Let's arbitrarily assume that number Y is xxxxxxx1000 (x can be 0 or 1).
xxxxxxx1000 - 1 = xxxxxxx0111
xxxxxxx1000 & xxxxxxx0111 = xxxxxxx0000 (See, the "rightmost 1" is gone.)
So the number of repetitions of Y &= (Y-1) before Y becomes 0 will be the total number of 1's in Y.
Why do x & (x-1) delete the right most order bit? Just try and see:
If the righmost order bit is 1, x has a binary representation of a...b1 and x-1 is a...b0 so the bitwise and will give a...b1 because common bits are left unchanged by the and and 1 & 0 is 0
Else x has a binary representation of a...b10...0; x-1 is a...b01...1 and for same reason as above x & (x-1) will be a...b00...0 again clearing the rightmost order bit.
So instead of scanning all bits to find which one are 0 and which one are 1, you just iterate the operation x = x & (x-1) until x is 0: the number of steps will be the number of 1 bits. It is more efficient than the naive implementation because statistically you will use half number of steps.
Example of code:
int bitcount(unsigned int x) {
int nb = 0;
while (x != 0) {
x &= x-1;
nb++
}
return nb;
}
Ik I'm already very late (≈ 3.5yrs) but your example has mistake.
x = 1010 = 10
x - 1 = 1001 = 9
1010 & 1001 = 1000
So as you can see, it deleted the rightmost bit in 10.
7 = 111
6 = 110
5 = 101
4 = 100
3 = 011
2 = 010
1 = 001
0 = 000
Observe that the position of rightmost 1 in any number, the bit at that same position of that number minus one is 0. Thus ANDing x with x-1 will be reset (i.e. set to 0) the rightmost bit.
7 & 6 = 111 & 110 = 110 = 6
6 & 5 = 110 & 101 = 100 = 4
5 & 4 = 101 & 100 = 100 = 4
4 & 3 = 010 & 011 = 010 = 2
3 & 2 = 011 & 010 = 010 = 2
2 & 1 = 010 & 001 = 000 = 0
1 & 0 = 001 & 000 = 000 = 0
I'm supposed to match the worded descriptions to the bitwise operations. W is one less than the total bits in a's and b's data structure. So if a is 32 bits long W is 31 Here are the worded descriptions:
1. One’s complement of a
2. a.
3. a&b.
4. a * 7.
5. a / 4 .
6. (a<0)?1:-1.
and here are the bitwise descriptions:
a. ̃( ̃a | (b ˆ (MIN_INT + MAX_INT)))
b. ((aˆb)& ̃b)|( ̃(aˆb)&b)
c. 1+(a<<3)+ ̃a
d. (a<<4)+(a<<2)+(a<<1)
e. ((a<0)?(a+3):a)>>2
f. a ˆ (MIN_INT + MAX_INT)
g. ̃((a|( ̃a+1))>>W)&1
h. ̃((a >> W) << 1)
i. a >> 2
I have a few of them solved namely:
a. ̃( ̃a | (b ˆ (MIN_INT + MAX_INT))) = a & b
b. ((aˆb)& ̃b)|( ̃(aˆb)&b) = a
c. 1+(a<<3)+ ̃a = 7 * a
d. (a<<4)+(a<<2)+(a<<1) = 16*a + 4*a + 2*a = 22*a
e. e. ((a<0)?(a+3):a)>>2 = (a<0)?(a/4 + 3/4) : a/4 = a/4 + ((a<0)?(3/4:0)
f. a ˆ (MIN_INT + MAX_INT) = ~a
i. a >> 2 = a/4
So basically all I need help with are g and h
g. ̃((a|( ̃a+1))>>W)&1
h. ̃((a >> W) << 1)
If you wouldn't mind could you also provide an explanation if you could?
I think this is what is going on with g:
g. ̃((a|( ̃a+1))>>W)&1 = ~((a|(two's complement of a) >>W)&1
= ~((a|sign of two's complement of a) &1 = ~(-a)&1
but this could be 1 or 0 so I don't think I did this right.
and for this one:
h. ̃((a >> W) << 1) = ~((sign of a) << 1) = ~((sign of a)*2)
and I don't know where to go from there...
Thank you for your help!!!
For g, consider that (a|~a) sets all bits to 1, so:
~((a|~a) >> W) & 1
~(all_ones >> W) & 1
~1 & 1
0
The only way adding 1 to ~a could possibly affect this result is if the addition flipped the most significant bit of ~a (due to the right shift by W). That can only happen if a is 0 or 2^W. In the latter case, we will get the same result as above because the top bit of (a|X) will always be set. However, when a is 0 ~a+1 (0's twos complement) is also 0 and the final result of the entire expression will instead be 1.
Therefore, g is 1 when a is zero, otherwise it is 0 (i.e. - g is equivalent to the C expression a == 0). That seemingly doesn't match any of your worded descriptions. Indeed, I don't see how any expression (X & 1) possibly matches any of your worded descriptions. None of your worded descriptions matches an expression that evaluates to only 0 or 1 (for all values of a, b).
For h, consider that if a is negative, then its top most bit is set. Because a is signed, right shifting it 31 positions drags the sign bit across all 32 bits of a. Then left shifting it one position sets the least significant bit to 0. Complementing that yields 1. If a is non-negative, then its top most bit is 0 and right shifting that 31 positions yields 0. Left shifting that 1 position still yields 0. Complementing that yields all bits set, which is the 2's complement rep of -1. Therefore, h is equivalent to (a < 0 ? 1 : -1) or #6 of your worded descriptions.
Okay i know this is a pretty mean task from which i got nightmares but maybe ..i'll crack that code thanks to someone of you.
I want to compare if number is between 0 and 10 with bitwise operators. Thats the thing.. it is between 0 and 10 and not for example between 0 and 2, 0 and 4, 0 and 8 and so on..
Reference for number/binary representation with 0-4 bits. (little endian)
0 0
1 1
2 10
3 11
4 100
5 101
6 110
7 111
8 1000
9 1001
10 1010
11 1011
12 1100
13 1101
14 1110
15 1111
Trying to figure out something like
if(((var & 4) >> var) + (var & 10))
I attempt to solve it with bitwise operators only (no addition).
The expression below will evaulate to nonzero if the number (v) is out of the 0 - 10 inclusive range:
(v & (~0xFU)) |
( ((v >> 3) & 1U) & ((v >> 2) & 1U) ) |
( ((v >> 3) & 1U) & ((v >> 1) & 1U) & (v & 1U) )
The first line is nonzero if the number is above 15 (any higher bit than the first four is set). The second line is nonzero if in the low 4 bits it is between 12 and 15 inclusive. The third line is nonzero if in the low 4 bits the number is either 11 or 15.
It was not clear in the question, but if the number to test is limited between the 0 - 15 inclusive range (only low 4 bits), then something nicer is possible here:
((~(v >> 3)) & 1U) |
( ((~(v >> 2)) & 1U) & (( ~v ) & 1U) ) |
( ((~(v >> 2)) & 1U) & ((~(v >> 1)) & 1U) )
First line is 1 if the number is between 0 and 7 inclusive. Second line is 1 if the number is one of 0, 2, 8 or 10. Third line is 1 if the number is one of 0, 1, 8 or 9. So OR combined the expression is 1 if the number is between 0 and 10 inclusive. Relating this solution, you may also check out the Karnaugh map, which can assist in generating these (and can also be used to prove there is no simpler solution here).
I don't think I could get any closer stricly using only bitwise operators in a reasonable manner. However if you can use addition it becomes a lot simpler as Pat's solution shows it.
Assuming that addition is allowed, then:
(v & ~0xf) | ((v+5) & ~0xf)
is non-zero if v is out-of-range. The first term tests if v is outside the range 0..15, and the second shifts the unwanted 11, 12, 13, 14, 15 outside the 0..15 range.
When addition is allowed and the range is 0..15, a simple solution is
(v - 11) & ~7
which is nonzero when v is in the range 0..10. Using shifts instead, you can use
(1<<10) >> v
which is also nonzero if the input is in the range 0..10. If the input range is unrestricted and the shift count is modulo 32, like on most CPUs, you can use
((1<<11) << ~v) | (v & ~15)
which is nonzero if the input is not in the range (the opposite is difficult since already v == 0 is difficult with only bitops). If other arithmetic operations are allowed, then
v / 11
can be used, which is also nonzero if the input is not in the range.
bool b1 = CheckCycleStateWithinRange(cycleState, 0b0, 0b1010); // Note *: 0b0 = 0 and 1010 = 10
bool CheckCycleStateWithinRange(int cycleState, int minRange, int maxRange) const
{
return ((IsGreaterThanEqual(cycleState, minRange) && IsLessThanEqual(cycleState, maxRange)) ? true : false );
}
int IsGreaterThanEqual(int cycleState, int limit) const
{
return ((limit + (~cycleState + 1)) >> 31 & 1) | (!(cycleState ^ limit));
}
int IsLessThanEqual(int cycleState, int limit) const
{
return !((limit + (~cycleState + 1)) >> 31 & 1) | (!(cycleState ^ limit));
}
I wanted to replace bit/bits (more than one) in a 32/64 bit data field without affecting other bits. Say for example:
I have a 64-bit register where bits 5 and 6 can take values 0, 1, 2, and 3.
5:6
---
0 0
0 1
1 0
1 1
Now, when I read the register, I get say value 0x146 (0001 0 10 0 0110). Now I want to change the value at bit position 5 and 6 to 01. (Right now it is 10, which is 2 in decimal, and I want to replace it to 1 e 01) without other bits getting affected and write back the register with only bits 5 and 6 modified (so it becomes 126 after changing).
I tried doing this:
reg_data = 0x146
reg_data |= 1 << shift // In this case, 'shift' is 5
If I do this, the value at bit positions 5 and 6 will become 11 (0x3), not 01 (0x1) which I wanted.
How do I go about doing read, modify, and write?
How do I replace only certain bit/bits in a 32/64 bit fields without affecting the whole data of the field using C?
Setting a bit is okay, but more than one bit, I am finding it little difficult.
Use a bitmask. It is sort of like:
new_value = 0, 1, 2 or 3 // (this is the value you will set in)
bit_mask = (3<<5) // (mask of the bits you want to set)
reg_data = (reg_data & (~bit_mask)) | (new_value<<5)
This preserves the old bits and OR's in the new ones.
reg_data &= ~( (1 << shift1) | (1 << shift2) );
reg_data |= ( (1 << shift1) | (1 << shift2) );
The first line clears the two bits at (shift1, shift2) and the second line sets them.
Here is a generic process which acts on a long array, considering it a long bitfield, and addresses each bit position individually:
#define set_bit(arr,x) ((arr[(x)>>3]) |= (0x01 << ((x) & 0x07)))
#define clear_bit(arr,x) (arr[(x)>>3] &= ~(0x01 << ((x) & 0x07)))
#define get_bit(arr,x) (((arr[(x)>>3]) & (0x01 << ((x) & 0x07))) != 0)
It simply takes the index, uses the lower three bits of the index to identify eight different bit positions inside each location of the char array, and the upper remainder bits addresses in which array location does the bit denoted by x occur.
To set a bit, you need to OR the target word with another word with 1 in that specific bit position and 0 in all other with the the target. All 0's in the other positions ensure that the existing 1's in the target are as it is during OR, and the 1 in the specific positions ensures that the target gets the 1 in that position. If we have mask = 0x02 = 00000010 (1 byte) then we can OR this to any word to set that bit position:
target = 1 0 1 1 0 1 0 0
OR + + + + + + + +
mask 0 0 0 0 0 0 1 0
---------------
answer 1 0 1 1 0 1 1 0
To clear a bit, you need to AND the target word with another word with 0 in that specific bit position and 1 in all. All 1's in all other bit positions ensure that during AND the target preserves its 0's and 1's as they were in those locations, and a 0 in the bit position to be cleared would also set that bit position 0 in the target word. If we have the same mask = 0x02, then we can prepare this mask for clearing by ~mask:
mask = 0 0 0 0 0 0 1 0
~mask = 1 1 1 1 1 1 0 1
AND . . . . . . . .
target 1 0 1 1 0 1 1 0
---------------
answer 1 0 1 1 0 1 0 0
Apply a mask against the bitfield to maintain the bits that you do not want to change. This will also clear out the bits that you will be changing.
Ensure that you have a bitfield that contains only the bits that you want to set/clear.
Either use the or operator to "or" the two bitfields, or just simply add them.
For instance, if you wanted to only change bits 2 thru 5 based on input of 0 thru 15.
byte newVal = (byte)value & 0x0F;
newVal = (byte)value << 2;
oldVal = oldVal & 0xC3;
oldVal = oldval + newVal;
The question was about how to implement it in C, but as all searches for "replace bits" lead to here, I will supply my implementation in VB.NET.
It has been unit test tested. For those who are wondering what the ToBinaryString extension looks like: Convert.ToString(value,2)
''' <summary>
''' Replace the bits in the enumValue with the bits in the bits parameter, starting from the position that corresponds to 2 to the power of the position parameter.
''' </summary>
''' <param name="enumValue">The integer value to place the bits in.</param>
''' <param name="bits">The bits to place. It must be smaller or equal to 2 to the power of the position parameter.</param>
'''<param name="length">The number of bits that the bits should replace.</param>
''' <param name="position">The exponent of 2 where the bits must be placed.</param>
''' <returns></returns>
''' <remarks></remarks>'
<Extension>
Public Function PlaceBits(enumValue As Integer, bits As Integer, length As Integer, position As Integer) As Integer
If position > 31 Then
Throw New ArgumentOutOfRangeException(String.Format("The position {0} is out of range for a 32 bit integer.",
position))
End If
Dim positionToPlace = 2 << position
If bits > positionToPlace Then
Throw New ArgumentOutOfRangeException(String.Format("The bits {0} must be smaler than or equal to {1}.",
bits, positionToPlace))
End If
'Create a bitmask (a series of ones for the bits to retain and a series of zeroes for bits to discard).'
Dim mask As Integer = (1 << length) - 1
'Use for debugging.'
'Dim maskAsBinaryString = mask.ToBinaryString'
'Shift the mask to left to the desired position'
Dim leftShift = position - length + 1
mask <<= leftShift
'Use for debugging.'
'Dim shiftedMaskAsBinaryString = mask.ToBinaryString'
'Shift the bits to left to the desired position.'
Dim shiftedBits = bits << leftShift
'Use for debugging.'
'Dim shiftedBitsAsBinaryString = shiftedBits.ToBinaryString'
'First clear (And Not) the bits to replace, then set (Or) them.'
Dim result = (enumValue And Not mask) Or shiftedBits
'Use for debugging.'
'Dim resultAsBinaryString = result.ToBinaryString'
Return result
End Function
You'll need to do that one bit at a time. Use the or, like you're currently doing, to set a bit to one, and use the following to set something to 0:
reg_data &= ~ (1 << shift)
You can use this dynamic logic for any number of bit and in any bit field.
Basically, you have three parts in a bit sequence of number -
MSB_SIDE | CHANGED_PART | LSB_SIDE
The CHANGED_PART can be moved up to the extreme MSB or LSB side.
The steps to replace a number of bit(s) are as follows -
Take only the MSB_SIDE part and replace all the remaining bits with 0.
Update the new bit sequence by adding your desired bit sequence in particular position.
Update the entire bit sequence with LSB_SIDE of the original bit sequence.
org_no = 0x53513C;
upd_no = 0x333;
start_pos = 0x6, bit_len = 0xA;
temp_no = 0x0;
temp_no = org_no & (0xFFFFFFFF << (bit_len + start_pos)); // This is step 1
temp_no |= upd_no << start_pos; // This is step 2
org_no = temp_no | (org_no & ~(0xFFFFFFFF << start_pos)); // This is step 3`
Note: The masking with 0xFFFFFFFF is considered as 32 bit. You can change accordingly with your requirement.