Explanation of an algorithm to set, clear and test a single bit - c

Hey, in the Programming Pearls book, there is a source code for setting, clearing and testing a bit of the given index in an array of ints that is actually a set representation.
The code is the following:
#include<stdio.h>
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1+ N/BITSPERWORD];
void set(int i)
{
a[i>>SHIFT] |= (1<<(i & MASK));
}
void clr(int i)
{
a[i>>SHIFT] &= ~(1<<(i & MASK));
}
int test(int i)
{
a[i>>SHIFT] & (1<<(i & MASK));
}
Could somebody explain me the reason of the SHIFT and the MASK defines? And what are their purposes in the code?
I've already read the previous related question.

VonC posted a good answer about bitmasks in general. Here's some information that's more specific to the code you posted.
Given an integer representing a bit, we work out which member of the array holds that bit. That is: Bits 0 to 31 live in a[0], bits 32 to 63 live in a[1], etc. All that i>>SHIFT does is i / 32. This works out which member of a the bit lives in. With an optimising compiler, these are probably equivalent.
Obviously, now we've found out which member of a that bitflag lives in, we need to ensure that we set the correct bit in that integer. This is what 1 << i does. However, we need to ensure that we don't try to access the 33rd bit in a 32-bit integer, so the shift operation is constrained by using 1 << (i & 0x1F). The magic here is that 0x1F is 31, so we'll never left-shift the bit represented by i more than 31 places (otherwise it should have gone in the next member of a).

From Here (General answer to get this thread started)
A bit mask is a value (which may be stored in a variable) that enables you to isolate a specific set of bits within an integer type.
Normally the masked will have the bits you are interested in set to 1 and all the other bits set to 0. The mask then allows you to isolate the value of the bits, clear all the bits or set all the bits or set a new value to the bits.
Masks (particularly multi-bit ones) often have an associated shift value which is the amount the bits need shifting left so that the least significant masked bit is shifted to the least significant bit in the type.
For example using a 16 bit short data type suppose you wanted to be able to mask bits 3, 4 and 5 (LSB is number 0). You mask and shift would look something like
#define MASK 0x0038
#define SHIFT 3
Masks are often assigned in hexadecimal because it is easier to work with bits in the data type in that base as opposed to decimal. Historically octal has also been used for bit masks.
If I have a variable, var, that contains data that the mask is relevant to then I can isolate the bits like this
var & MASK
I can isolate all the other bits like this
var & ~MASK
I can clear the bits like this
var &= ~MASK;
I can clear all the other bits like this
var &= MASK;
I can set all the bits like this
var |= MASK;
I can set all the other bits like this
var |= ~MASK;
I can extract the decimal value of the bits like this
(var & MASK) >> SHIFT
I can assign a new value to the bits like this
var &= ~MASK;
var |= (newValue << SHIFT) & MASK;

When You want to set a bit inside the array, You have to
seek to the right array index and
set the appropriate bit inside this array item.
There are BITSPERWORD (=32) bits in one array item, which means that the index i has to be split into two parts:
rightmost 5 bits serve as an index in the array item and
the rest of the bits (leftmost 28) serve as an index into the array.
You get:
the leftmost 28 bits by discarding the rightmost five, which is exactly what i>>SHIFT does, and
the rightmost five bits by masking out anything but the rightmost five bits, which is what i & MASK does.
I guess You understand the rest.

Bitwise operation and the leading paragraphs of Mask are a concise explanation, and contain some pointers for further study.
Think of an 8-bit byte as a set of elements from an 8-member universe. A member is IN the set when the corresponding bit is set. Setting a bit more then once doesn't modify set membership (a bit can have only 2 states). The bitwise operators in C provide access to bits by masking and shifting.

The code is trying to store N bits by an array, where each element of the array contains BITSPERWORD (32) bits.
Thus if you're trying to access bit i, you need to calculate the index of the array element stores it (i/32), which is what i>>SHIFT does.
And then you need to access that bit in the array element we just got.
(i & MASK) gives the bit position at the array element (word).
(1<<(i & MASK)) makes the bit at that position to be set.
Now you can set/clear/test that bit in a[i>>SHIFT] by (1<<i & MASK)).
You may also think i is a 32 bits number, that bits 6~31 is the index of the array element stores it, bits 0~5 represents the bit position in the word.

Related

Change 4 middle bits of a byte in C

I'm trying to change the 4 middle bits of a byte to correspond to the High nibble of another byte:
Suppose we start with:
In = 0bABCDEFGH
Out = 0bXXXXXXXX // Some random byte
I want:
Out = 0bXXABCDXX
Leaving whatever other bits were in Out's extremes unchanged.
How can I do this?
Note: The 'X' represents any bit, 0 or 1, just to distinguish what came from the input.
I got to:
(0b00111100 & (IN>>2)) = 0b00ABCD00
, which filters the high nibble and centers it but then what? How can I move it to Out?
simple:
out &= 0b11000011;
out |= (in >> 2 & 0b00111100);
out &= 0b11000011 sets out to 0bxx0000xx preserving 2 most significant bits and 2 least significant bits. in >> 2 shifts input by 2 giving us 0xYYABCDEF, YY could be 00 or 11 depending on what A is. To get rid of YY and EF we do & 0b00111100.
As pointed by #JB 0B is not standard notation, thus you should use something else, most preferably hex 0x notation. See this for more info.
Thus using hex this would be:
out &= 0xC3;
out |= (in >> 2 & 0x3C)
here is conversion table
`0xf` is `0b1111`
`0x3` is `0b0011`
`0xc` is `0b1100`
Assuming in and out are unsigned char, and that CHAR_BIT == 8:
out = (out & 0xC3) | ((in >> 2) & 0x3C);
i.e. 4 operations in total.
There are multiple alternatives. From a high-level perspective, you could
force the four middle bits of Out off, prepare a mask from In as show in your question, and combine Out and mask via bitwise OR (|)
force the four middle bits of Out off, prepare a mask from In as show in your question, and combine Out and mask via bitwise EXCLUSIVE OR (^)
force the four middle bits of Out on, prepare a mask from In similarly to how you do now, but with the outer bits on, and combine Out and mask via bitwise AND (&)
use a series of shifts, masks, and addition or bitwise OR operations to build up the wanted result section by section
Forcing bits off is achieved by bitwise AND with a mask that has 0s at (only) the positions you want to turn off.
Forcing bits on is achieved by bitwise OR with a mask that has 1s at (only) the positions you want to turn on.
You already seem to have a handle on shifting, though you do need to be careful there if you happen to be shifting objects of signed types. Prefer to use unsigned types for bit manipulation wherever possible.

Setting and clearing bits in C

I have some trouble with a C program that does some bit manipulation. In the program, I use an unsigned long long int variable to represent a 64 bit map, each bit representing a position on the map. I need to be able to update these bits (positions) i.e. setting or clearing a bit.
To clear and set a bit, I do (0 is the least significant position):
map &= ~(1 << pos) // clear bit in position 'pos'
map |= (1 << pos) // set bit in position 'pos'
The problem is that when i perform these operations, all the bits in the map which are to the left of pos get set to 0 (while I want only the bit in position pos to change).
What am I doing wrong?
The problem is that those shifts are done using the type int, which on all modern 64-bit systems are still 32 bits. You need to use the same type as map, i.e. unsigned long long:
1ull << pos
Note the ull which tells the compiler that the 1 is not an int but an unsigned long long.

How do you compare only certain bits in data type?

I'm trying to learn a bit about emulation and I'm trying to think of how I can decode opcodes. Each opcode is a short data type, 16 bits. I'd like to be able to compare only specific sets of 4 bits. For example: there are multiple opcodes that start with 00, such as 0x00E0.
I'd like to be able to compare each of these values in either bit or hexidecimal form. I was thinking maybe something along the lines of bit shifting to bump of everything else off so that the bits I don't care about would zero out. That may cause issues for the center bits and will require additional steps. What kind of solutions do you guys use for a problem like this?
Use a bit mask, which has the bits set that you care about. Then use the & operator to zero out everything that you don't care about. For instance, say we want to compare the lowest four bits in a and b:
uint16 mask = 0x000f;
if ((a & mask) == (b & mask)) {
// lowest 4 bits are equal
}
This is simple bit manipulation. You can mask the relevant bits with
int x = opcode & 0x00f0;
and compare the resulting value
if (x == 0x00e0) {
/* do something */
}
you can easily create the mask of "nbits" and and shift "pos" number of bits and do comparision
uint32_t mask = ~((~0) << nbits);
if( (num(mask << pos)) == 0x00e0 ) {
/* Do something */
}

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

bitwise indexing in C?

I'm trying to implement a data compression idea I've had, and since I'm imagining running it against a large corpus of test data, I had thought to code it in C (I mostly have experience in scripting languages like Ruby and Tcl.)
Looking through the O'Reilly 'cow' books on C, I realize that I can't simply index the bits of a simple 'char' or 'int' type variable as I'd like to to do bitwise comparisons and operators.
Am I correct in this perception? Is it reasonable for me to use an enumerated type for representing a bit (and make an array of these, and writing functions to convert to and from char)? If so, is such a type and functions defined in a standard library already somewhere? Are there other (better?) approaches? Is there some example code somewhere that someone could point me to?
Thanks -
Following on from what Kyle has said, you can use a macro to do the hard work for you.
It is possible.
To set the nth bit, use OR:
x |= (1 << 5); // sets the 6th-from
right
To clear a bit, use AND:
x &= ~(1 << 5); // clears
6th-from-right
To flip a bit, use XOR:
x ^= (1 << 5); // flips 6th-from-right
Or...
#define GetBit(var, bit) ((var & (1 << bit)) != 0) // Returns true / false if bit is set
#define SetBit(var, bit) (var |= (1 << bit))
#define FlipBit(var, bit) (var ^= (1 << bit))
Then you can use it in code like:
int myVar = 0;
SetBit(myVar, 5);
if (GetBit(myVar, 5))
{
// Do something
}
It is possible.
To set the nth bit, use OR:
x |= (1 << 5); // sets the 5th-from right
To clear a bit, use AND:
x &= ~(1 << 5); // clears 5th-from-right
To flip a bit, use XOR:
x ^= (1 << 5); // flips 5th-from-right
To get the value of a bit use shift and AND:
(x & (1 << 5)) >> 5 // gets the value (0 or 1) of the 5th-from-right
note: the shift right 5 is to ensure the value is either 0 or 1. If you're just interested in 0/not 0, you can get by without the shift.
Have a look at the answers to this question.
Theory
There is no C syntax for accessing or setting the n-th bit of a built-in datatype (e.g. a 'char'). However, you can access bits using a logical AND operation, and set bits using a logical OR operation.
As an example, say that you have a variable that holds 1101 and you want to check the 2nd bit from the left. Simply perform a logical AND with 0100:
1101
0100
---- AND
0100
If the result is non-zero, then the 2nd bit must have been set; otherwise is was not set.
If you want to set the 3rd bit from the left, then perform a logical OR with 0010:
1101
0010
---- OR
1111
You can use the C operators && (for AND) and || (for OR) to perform these tasks. You will need to construct the bit access patterns (the 0100 and 0010 in the above examples) yourself. The trick is to remember that the least significant bit (LSB) counts 1s, the next LSB counts 2s, then 4s etc. So, the bit access pattern for the n-th LSB (starting at 0) is simply the value of 2^n. The easiest way to compute this in C is to shift the binary value 0001 (in this four bit example) to the left by the required number of places. As this value is always equal to 1 in unsigned integer-like quantities, this is just '1 << n'
Example
unsigned char myVal = 0x65; /* in hex; this is 01100101 in binary. */
/* Q: is the 3-rd least significant bit set (again, the LSB is the 0th bit)? */
unsigned char pattern = 1;
pattern <<= 3; /* Shift pattern left by three places.*/
if(myVal && (char)(1<<3)) {printf("Yes!\n");} /* Perform the test. */
/* Set the most significant bit. */
myVal |= (char)(1<<7);
This example hasn't been tested, but should serve to illustrate the general idea.
To query state of bit with specific index:
int index_state = variable & ( 1 << bit_index );
To set bit:
varabile |= 1 << bit_index;
To restart bit:
variable &= ~( 1 << bit_index );
Try using bitfields. Be careful the implementation can vary by compiler.
http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html
IF you want to index a bit you could:
bit = (char & 0xF0) >> 7;
gets the msb of a char. You could even leave out the right shift and do a test on 0.
bit = char & 0xF0;
if the bit is set the result will be > 0;
obviousuly, you need to change the mask to get different bits (NB: the 0xF is the bit mask if it is unclear). It is possible to define numerous masks e.g.
#define BIT_0 0x1 // or 1 << 0
#define BIT_1 0x2 // or 1 << 1
#define BIT_2 0x4 // or 1 << 2
#define BIT_3 0x8 // or 1 << 3
etc...
This gives you:
bit = char & BIT_1;
You can use these definitions in the above code to sucessfully index a bit within either a macro or a function.
To set a bit:
char |= BIT_2;
To clear a bit:
char &= ~BIT_3
To toggle a bit
char ^= BIT_4
This help?
Individual bits can be indexed as follows.
Define a struct like this one:
struct
{
unsigned bit0 : 1;
unsigned bit1 : 1;
unsigned bit2 : 1;
unsigned bit3 : 1;
unsigned reserved : 28;
} bitPattern;
Now if I want to know the individual bit values of a var named "value", do the following:
CopyMemory( &input, &value, sizeof(value) );
To see if bit 2 is high or low:
int state = bitPattern.bit2;
Hope this helps.
There is a standard library container for bits: std::vector. It is specialised in the library to be space efficient. There is also a boost dynamic_bitset class.
These will let you perform operations on a set of boolean values, using one bit per value of underlying storage.
Boost dynamic bitset documentation
For the STL documentation, see your compiler documentation.
Of course, you can also address the individual bits in other integral types by hand. If you do that, you should use unsigned types so that you don't get undefined behaviour if decide to do a right shift on a value with the high bit set. However, it sounds like you want the containers.
To the commenter who claimed this takes 32x more space than necessary: boost::dynamic_bitset and vector are specialised to use one bit per entry, and so there is not a space penalty, assuming that you actually want more than the number of bits in a primitive type. These classes allow you to address individual bits in a large container with efficient underlying storage. If you just want (say) 32 bits, by all means, use an int. If you want some large number of bits, you can use a library container.

Resources