Bit Twiddling in C - Counting Bits

Bit Twiddling in C - Counting Bits - c

I want to count the bits that are set in an extremely large bit-vector (i.e. 100,000 bits).
What I am currently doing is using a pointer to char (i.e. char *cPtr) to point to the beginning of the array of bits. I then:
1. look at each element of the array (i.e. cPtr[x]),
2. convert it to an integer (i.e. (int) cPtr[x])
3. use a 256 element look-up table to see how many bits are set in the given byte (i.e. cPtr[x]).
It occurs to me that if I use a short int pointer instead (i.e. short int * sPtr), then I will only need half as many look-ups, but with a 65534 element look-up table, which will have its own cost in memory usage.
I'm wondering what is the optimal number of bits to examine each time. Also, if that number is not the size of some preset type, how can I walk down my bit-vector and set a pointer to be ANY arbitrary number of bits past the starting location of the bit array.
I know there are other ways to count bits, but for now I want to be certain I can optimize this method before making comparisons to other methods.

You can count it using bitwise operation:
char c = cPtr[x];
int num = ((c & 0x01) >> 0) +
((c & 0x02) >> 1) +
((c & 0x04) >> 2) +
((c & 0x08) >> 3) +
((c & 0x10) >> 4) +
((c & 0x20) >> 5) +
((c & 0x40) >> 6) +
((c & 0x80) >> 7);
It might seem a little long, but it doesn't require accessing many time to memory, so after all it seems pretty cheap for me.
You can even make it cheaper by reading an int every time, but then you will probably need to address an alignment issue.

I'm wondering what is the optimal number of bits to examine each time
The only way to find out is to test. See this question for a discussion of the fastest way to count 32 bits at a time.
Also, if that number is not the size of some preset type, how can I
walk down my bit-vector and set a pointer to be ANY arbitrary number
of bits past the starting location of the bit array.
You can't set a pointer to an arbitrary bit. Most machines have byte-addressing, some can only address words.
You can construct a word starting with an arbitrary bit like so:
long wordAtBit(int32_t* array, size_t bit)
{
size_t idx = bit>>5;
long word = array[idx] >> (bit&31);
return word | (array[idx+1] << (32 - (bit&31));
}

This should be quite fast (taken from Wikipedia):
static unsigned char wordbits[65536] = { bitcounts of ints between 0 and 65535 };
static int popcount(uint32 i)
{
return (wordbits[i&0xFFFF] + wordbits[i>>16]);
}
In this way, you can check 32 bits per iteration.

I am a bit late to the party, but there are much faster approaches than the ones that have been suggested so far. The reason is that many modern architectures offer hardware instructions to count the number of bits in various ways (leading zeroes, leading ones, trailing zeroes or ones, counting the number of bits set to 1, etc...). Counting the number of bits set to 1 is called the Hamming weight, also commonly called population count, or just popcount.
As a matter of fact, x86 CPUs have a POPCNT instruction as part of the SSE4.2 instruction set. The very latest latest CPU architecture from Intel (nicknamed Haswell) offer even more hardware support for bit manipulation with the BMI1 and BMI2 extensions - maybe there is something else to use there!

Related

How do you compare only certain bits in data type?

I'm trying to learn a bit about emulation and I'm trying to think of how I can decode opcodes. Each opcode is a short data type, 16 bits. I'd like to be able to compare only specific sets of 4 bits. For example: there are multiple opcodes that start with 00, such as 0x00E0.
I'd like to be able to compare each of these values in either bit or hexidecimal form. I was thinking maybe something along the lines of bit shifting to bump of everything else off so that the bits I don't care about would zero out. That may cause issues for the center bits and will require additional steps. What kind of solutions do you guys use for a problem like this?

Use a bit mask, which has the bits set that you care about. Then use the & operator to zero out everything that you don't care about. For instance, say we want to compare the lowest four bits in a and b:
uint16 mask = 0x000f;
if ((a & mask) == (b & mask)) {
// lowest 4 bits are equal
}

This is simple bit manipulation. You can mask the relevant bits with
int x = opcode & 0x00f0;
and compare the resulting value
if (x == 0x00e0) {
/* do something */
}

you can easily create the mask of "nbits" and and shift "pos" number of bits and do comparision
uint32_t mask = ~((~0) << nbits);
if( (num(mask << pos)) == 0x00e0 ) {
/* Do something */
}

Applications of bitwise operators in C and their efficiency? [duplicate]

This question already has answers here:
Real world use cases of bitwise operators [closed]
(41 answers)
Closed 6 years ago.
I am new to bitwise operators.
I understand how the logic functions work to get the final result. For example, when you bitwise AND two numbers, the final result is going to be the AND of those two numbers (1 & 0 = 0; 1 & 1 = 1; 0 & 0 = 0). Same with OR, XOR, and NOT.
What I don't understand is their application. I tried looking everywhere and most of them just explain how bitwise operations work. Of all the bitwise operators I only understand the application of shift operators (multiplication and division). I also came across masking. I understand that masking is done using bitwise AND but what exactly is its purpose and where and how can I use it?
Can you elaborate on how I can use masking? Are there similar uses for OR and XOR?

The low-level use case for the bitwise operators is to perform base 2 math. There is the well known trick to test if a number is a power of 2:
if ((x & (x - 1)) == 0) {
printf("%d is a power of 2\n", x);
}
But, it can also serve a higher level function: set manipulation. You can think of a collection of bits as a set. To explain, let each bit in a byte to represent 8 distinct items, say the planets in our solar system (Pluto is no longer considered a planet, so 8 bits are enough!):
#define Mercury (1 << 0)
#define Venus (1 << 1)
#define Earth (1 << 2)
#define Mars (1 << 3)
#define Jupiter (1 << 4)
#define Saturn (1 << 5)
#define Uranus (1 << 6)
#define Neptune (1 << 7)
Then, we can form a collection of planets (a subset) like using |:
unsigned char Giants = (Jupiter|Saturn|Uranus|Neptune);
unsigned char Visited = (Venus|Earth|Mars);
unsigned char BeyondTheBelt = (Jupiter|Saturn|Uranus|Neptune);
unsigned char All = (Mercury|Venus|Earth|Mars|Jupiter|Saturn|Uranus|Neptune);
Now, you can use a & to test if two sets have an intersection:
if (Visited & Giants) {
puts("we might be giants");
}
The ^ operation is often used to see what is different between two sets (the union of the sets minus their intersection):
if (Giants ^ BeyondTheBelt) {
puts("there are non-giants out there");
}
So, think of | as union, & as intersection, and ^ as union minus the intersection.
Once you buy into the idea of bits representing a set, then the bitwise operations are naturally there to help manipulate those sets.

One application of bitwise ANDs is checking if a single bit is set in a byte. This is useful in networked communication, where protocol headers attempt to pack as much information into the smallest area as is possible in an effort to reduce overhead.
For example, the IPv4 header utilizes the first 3 bits of the 6th byte to tell whether the given IP packet can be fragmented, and if so whether to expect more fragments of the given packet to follow. If these fields were the size of ints (1 byte) instead, each IP packet would be 21 bits larger than necessary. This translates to a huge amount of unnecessary data through the internet every day.
To retrieve these 3 bits, a bitwise AND could be used along side a bit mask to determine if they are set.
char mymask = 0x80;
if(mymask & (ipheader + 48) == mymask)
//the second bit of the 6th byte of the ip header is set

Small sets, as has been mentioned. You can do a surprisingly large number of operations quickly, intersection and union and (symmetric) difference are obviously trivial, but for example you can also efficiently:
get the lowest item in the set with x & -x
remove the lowest item from the set with x & (x - 1)
add all items smaller than the smallest present item
add all items higher than the smallest present item
calculate their cardinality (though the algorithm is nontrivial)
permute the set in some ways, that is, change the indexes of the items (not all permutations are equally efficient)
calculate the lexicographically next set that contains as many items (Gosper's Hack)
1 and 2 and their variations can be used to build efficient graph algorithms on small graphs, for example see algorithm R in The Art of Computer Programming 4A.
Other applications of bitwise operations include, but are not limited to,
Bitboards, important in many board games. Chess without bitboards is like Christmas without Santa. Not only is it a space-efficient representation, you can do non-trivial computations directly with the bitboard (see Hyperbola Quintessence)
sideways heaps, and their application in finding the Nearest Common Ancestor and computing Range Minimum Queries.
efficient cycle-detection (Gosper's Loop Detection, found in HAKMEM)
adding offsets to Z-curve addresses without deconstructing and reconstructing them (see Tesseral Arithmetic)
These uses are more powerful, but also advanced, rare, and very specific. They show, however, that bitwise operations are not just a cute toy left over from the old low-level days.

Example 1
If you have 10 booleans that "work together" you can do simplify your code a lot.
int B1 = 0x01;
int B2 = 0x02;
int B10 = 0x0A;
int someValue = get_a_value_from_somewhere();
if (someValue & (B1 + B10)) {
// B1 and B10 are set
}
Example 2
Interfacing with hardware. An address on the hardware may need bit level access to control the interface. e.g. an overflow bit on a buffer or a status byte that can tell you the status of 8 different things. Using bit masking you can get down the the actual bit of info you need.
if (register & 0x80) {
// top bit in the byte is set which may have special meaning.
}
This is really just a specialized case of example 1.

Bitwise operators are particularly useful in systems with limited resources as each bit can encode a boolean. Using many chars for flags is wasteful as each takes one byte of space (when they could be storing 8 flags each).
Commonly microcontrollers have C interfaces for their IO ports in which each bit controls 1 of 8 ports. Without bitwise operators these would be quite difficult to control.
Regarding masking, it is common to use both & and |:
x & 0x0F //ensures the 4 high bits are 0
x | 0x0F //ensures the 4 low bits are 1

In microcontroller applications, you can utilize bitwise to switch between ports. In the below picture, if we would like to turn on a single port while turning off the rest, then the following code can be used.
void main()
{
unsigned char ON = 1;
TRISB=0;
PORTB=0;
while(1){
PORTB = ON;
delay_ms(200);
ON = ON << 1;
if(ON == 0) ON=1;
}
}

String to very long sequence of length less than 1 byte

I can't guess how to solve following problem. Assume I have a string or an array of integer-type variables (uchar, char, integer, whatever). Each of these data type is 1 byte long or more.
I would like to read from such array but read a pieces that are smaller than 1 byte, e.g. 3 bits (values 0-7). I tried to do a loop like
cout << ( (tab[index] >> lshift & lmask) | (tab[index+offset] >> rshift & rmask) );
but guessing how to set these variables is out of my reach. What is the metodology to solve such problem?
Sorry if question has been ever asked, but searching gives no answer.

I am sure this is not the best solution, as there some inefficiencies in the code that could be eliminated, but I think the idea is workable. I only tested it briefly:
void bits(uint8_t * src, int arrayLength, int nBitCount) {
int idxByte = 0; // byte index
int idxBitsShift = 7; // bit index: start at the high bit
// walk through the array, computing bit sets
while (idxByte < arrayLength) {
// compute a single bit set
int nValue = 0;
for (int i=2; i>=0; i--) {
nValue += (src[idxByte] & (1<<idxBitsShift)) >> (idxBitsShift-i);
if ((--idxBitsShift) < 0) {
idxBitsShift=8;
if (++idxByte >= arrayLength)
break;
}
}
// print it
printf("%d ", nValue);
}
}
int main() {
uint8_t a[] = {0xFF, 0x80, 0x04};
bits(a, 3, 3);
}
The thing with collecting bits across byte boundaries is a bit of a PITA, so I avoided all that by doing this a bit at a time, and then collecting the bits together in the nValue. You could have smarter code that does this three (or however many) bits at a time, but as far as I am concerned, with problems like this it is usually best to start with a simple solution (unless you already know how to do a better one) and then do something more complicated.

In short, the way the data is arranged in memory strictly depends on :
the Endianess
the standard used for computation/representation ( usually it's the IEEE 754 )
the type of the given variable
Now, you can't "disassemble" a data structure with this rationale without destroing its own meaning, simply put, if you are going to subdivide your variable in "bitfields" you are just picturing an undefined value.
In computer science there are data structure or informations structured in blocks, like many hashing algorithms/hash results, but a numerical value it's not stored like that and you are supposed to know what you are doing to prevent any data loss.
Another thing to note is that your definition of "pieces that are smaller than 1 byte" doesn't make much sense, it's also highly intrusive, you are losing abstraction here and you can also do something bad.

Here's the best method I could come up with for setting individual bits of a variable:
Assume we need to set the first four bits of variable1 (a char or other byte long variable) to 1010
variable1 &= 0b00001111; //Zero the first four bytes
variable1 |= 0b10100000; //Set them to 1010, its important that any unaffected bits be zero
This could be extended to whatever bits desired by placing zeros in the first number corresponding to the bits which you wish to set (the first four in the example's case), and placing zeros in the second number corresponding to the bits which you wish to remain neutral in the second number (the last four in the example's case). The second number could also be derived by bit-shifting your desired value by the appropriate number of places (which would have been four in the example's case).
In response to your comment this can be modified as follows to accommodate for increased variability:
For this operation we will need two shifts assuming you wish to be able to modify non-starting and non-ending bits. There are two sets of bits in this case the first (from the left) set of unaffected bits and the second set. If you wish to modify four bits skipping the first bit from the left (1 these four bits 111 for a single byte), the first shift would be would be 7 and the second shift would be 5.
variable1 &= ( ( 0b11111111 << shift1 ) | 0b11111111 >> shift2 );
Next the value we wish to assign needs to be shifted and or'ed in.
However, we will need a third shift to account for how many bits we want to set.
This shift (we'll call it shift3) is shift1 minus the number of bits we wish to modify (as previously mentioned 4).
variable1 |= ( value << shift3 );

optimized byte array shifter

I'm sure this has been asked before, but I need to implement a shift operator on a byte array of variable length size. I've looked around a bit but I have not found any standard way of doing it. I came up with an implementation which works, but I'm not sure how efficient it is. Does anyone know of a standard way to shift an array, or at least have any recommendation on how to boost the performance of my implementation;
char* baLeftShift(const char* array, size_t size, signed int displacement,char* result)
{
memcpy(result,array,size);
short shiftBuffer = 0;
char carryFlag = 0;
char* byte;
if(displacement > 0)
{
for(;displacement--;)
{
for(byte=&(result[size - 1]);((unsigned int)(byte))>=((unsigned int)(result));byte--)
{
shiftBuffer = *byte;
shiftBuffer <<= 1;
*byte = ((carryFlag) | ((char)(shiftBuffer)));
carryFlag = ((char*)(&shiftBuffer))[1];
}
}
}
else
{
unsigned int offset = ((unsigned int)(result)) + size;
displacement = -displacement;
for(;displacement--;)
{
for(byte=(char*)result;((unsigned int)(byte)) < offset;byte++)
{
shiftBuffer = *byte;
shiftBuffer <<= 7;
*byte = ((carryFlag) | ((char*)(&shiftBuffer))[1]);
carryFlag = ((char)(shiftBuffer));
}
}
}
return result;
}

If I can just add to what #dwelch is saying, you could try this.
Just move the bytes to their final locations. Then you are left with a shift count such as 3, for example, if each byte still needs to be left-shifted 3 bits into the next higher byte. (This assumes in your mind's eye the bytes are laid out in ascending order from right to left.)
Then rotate each byte to the left by 3. A lookup table might be faster than individually doing an actual rotate. Then, in each byte, the 3 bits to be shifted are now in the right-hand end of the byte.
Now make a mask M, which is (1<<3)-1, which is simply the low order 3 bits turned on.
Now, in order, from high order byte to low order byte, do this:
c[i] ^= M & (c[i] ^ c[i-1])
That will copy bits to c[i] from c[i-1] under the mask M.
For the last byte, just use a 0 in place of c[i-1].
For right shifts, same idea.

My first suggestion would be to eliminate the for loops around the displacement. You should be able to do the necessary shifts without the for(;displacement--;) loops. For displacements of magnitude greater than 7, things get a little trickier because your inner loop bounds will change and your source offset is no longer 1. i.e. your input buffer offset becomes magnitude / 8 and your shift becomes magnitude % 8.

It does look inefficient and perhaps this is what Nathan was referring to.
assuming a char is 8 bits where this code is running there are two things to do first move the whole bytes, for example if your input array is 0x00,0x00,0x12,0x34 and you shift left 8 bits then you get 0x00 0x12 0x34 0x00, there is no reason to do that in a loop 8 times one bit at a time. so start by shifting the whole chars in the array by (displacement>>3) locations and pad the holes created with zeros some sort of for(ra=(displacement>>3);ra>3)] = array[ra]; for(ra-=(displacement>>3);ra>(7-(displacement&7))). a good compiler will precompute (displacement>>3), displacement&7, 7-(displacement&7) and a good processor will have enough registers to keep all of those values. you might help the compiler by making separate variables for each of those items, but depending on the compiler and how you are using it it could make it worse too.
The bottom line though is time the code. perform a thousand 1 bit shifts then a thousand 2 bit shifts, etc time the whole thing, then try a different algorithm and time it the same way and see if the optimizations make a difference, make it better or worse. If you know ahead of time this code will only ever be used for single or less than 8 bit shifts adjust the timing test accordingly.
your use of the carry flag implies that you are aware that many processors have instructions specifically for chaining infinitely long shifts using the standard register length (for single bit at a time) rotate through carry basically. Which the C language does not support directly. for chaining single bit shifts you could consider assembler and likely outperform the C code. at least the single bit shifts are faster than C code can do. A hybrid of moving the bytes then if the number of bits to shift (displacement&7) is maybe less than 4 use the assembler else use a C loop. again the timing tests will tell you where the optimizations are.

Large bit arrays in C

Our OS professor mentioned that for assigning a process id to a new process, the kernel incrementally searches for the first zero bit in a array of size equivalent to the maximum number of processes(~32,768 by default), where an allocated process id has 1 stored in it.
As far as I know, there is no bit data type in C. Obviously, there's something I'm missing here.
Is there any such special construct from which we can build up a bit array? How is this done exactly?
More importantly, what are the operations that can be performed on such an array?

Bit arrays are simply byte arrays where you use bitwise operators to read the individual bits.
Suppose you have a 1-byte char variable. This contains 8 bits. You can test if the lowest bit is true by performing a bitwise AND operation with the value 1, e.g.
char a = /*something*/;
if (a & 1) {
/* lowest bit is true */
}
Notice that this is a single ampersand. It is completely different from the logical AND operator &&. This works because a & 1 will "mask out" all bits except the first, and so a & 1 will be nonzero if and only if the lowest bit of a is 1. Similarly, you can check if the second lowest bit is true by ANDing it with 2, and the third by ANDing with 4, etc, for continuing powers of two.
So a 32,768-element bit array would be represented as a 4096-element byte array, where the first byte holds bits 0-7, the second byte holds bits 8-15, etc. To perform the check, the code would select the byte from the array containing the bit that it wanted to check, and then use a bitwise operation to read the bit value from the byte.
As far as what the operations are, like any other data type, you can read values and write values. I explained how to read values above, and I'll explain how to write values below, but if you're really interested in understanding bitwise operations, read the link I provided in the first sentence.
How you write a bit depends on if you want to write a 0 or a 1. To write a 1-bit into a byte a, you perform the opposite of an AND operation: an OR operation, e.g.
char a = /*something*/;
a = a | 1; /* or a |= 1 */
After this, the lowest bit of a will be set to 1 whether it was set before or not. Again, you could write this into the second position by replacing 1 with 2, or into the third with 4, and so on for powers of two.
Finally, to write a zero bit, you AND with the inverse of the position you want to write to, e.g.
char a = /*something*/;
a = a & ~1; /* or a &= ~1 */
Now, the lowest bit of a is set to 0, regardless of its previous value. This works because ~1 will have all bits other than the lowest set to 1, and the lowest set to zero. This "masks out" the lowest bit to zero, and leaves the remaining bits of a alone.

A struct can assign members bit-sizes, but that's the extent of a "bit-type" in 'C'.
struct int_sized_struct {
int foo:4;
int bar:4;
int baz:24;
};
The rest of it is done with bitwise operations. For example. searching that PID bitmap can be done with:
extern uint32_t *process_bitmap;
uint32_t *p = process_bitmap;
uint32_t bit_offset = 0;
uint32_t bit_test;
/* Scan pid bitmap 32 entries per cycle. */
while ((*p & 0xffffffff) == 0xffffffff) {
p++;
}
/* Scan the 32-bit int block that has an open slot for the open PID */
bit_test = 0x80000000;
while ((*p & bit_test) == bit_test) {
bit_test >>= 1;
bit_offset++;
}
pid = (p - process_bitmap)*8 + bit_offset;
This is roughly 32x faster than doing a simple for loop scanning an array with one byte per PID. (Actually, greater than 32x since more of the bitmap is will stay in CPU cache.)

see http://graphics.stanford.edu/~seander/bithacks.html

No bit type in C, but bit manipulation is fairly straight forward. Some processors have bit specific instructions which the code below would nicely optimize for, even without that should be pretty fast. May or may not be faster using an array of 32 bit words instead of bytes. Inlining instead of functions would also help performance.
If you have the memory to burn just use a whole byte to store one bit (or whole 32 bit number, etc) greatly improve performance at the cost of memory used.
unsigned char data[SIZE];
unsigned char get_bit ( unsigned int offset )
{
//TODO: limit check offset
if(data[offset>>3]&(1<<(offset&7))) return(1);
else return(0);
}
void set_bit ( unsigned int offset, unsigned char bit )
{
//TODO: limit check offset
if(bit) data[offset>>3]|=1<<(offset&7);
else data[offset>>3]&=~(1<<(offset&7));
}

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight