bitwise indexing in C? - c

I'm trying to implement a data compression idea I've had, and since I'm imagining running it against a large corpus of test data, I had thought to code it in C (I mostly have experience in scripting languages like Ruby and Tcl.)
Looking through the O'Reilly 'cow' books on C, I realize that I can't simply index the bits of a simple 'char' or 'int' type variable as I'd like to to do bitwise comparisons and operators.
Am I correct in this perception? Is it reasonable for me to use an enumerated type for representing a bit (and make an array of these, and writing functions to convert to and from char)? If so, is such a type and functions defined in a standard library already somewhere? Are there other (better?) approaches? Is there some example code somewhere that someone could point me to?
Thanks -

Following on from what Kyle has said, you can use a macro to do the hard work for you.
It is possible.
To set the nth bit, use OR:
x |= (1 << 5); // sets the 6th-from
right
To clear a bit, use AND:
x &= ~(1 << 5); // clears
6th-from-right
To flip a bit, use XOR:
x ^= (1 << 5); // flips 6th-from-right
Or...
#define GetBit(var, bit) ((var & (1 << bit)) != 0) // Returns true / false if bit is set
#define SetBit(var, bit) (var |= (1 << bit))
#define FlipBit(var, bit) (var ^= (1 << bit))
Then you can use it in code like:
int myVar = 0;
SetBit(myVar, 5);
if (GetBit(myVar, 5))
{
// Do something
}

It is possible.
To set the nth bit, use OR:
x |= (1 << 5); // sets the 5th-from right
To clear a bit, use AND:
x &= ~(1 << 5); // clears 5th-from-right
To flip a bit, use XOR:
x ^= (1 << 5); // flips 5th-from-right
To get the value of a bit use shift and AND:
(x & (1 << 5)) >> 5 // gets the value (0 or 1) of the 5th-from-right
note: the shift right 5 is to ensure the value is either 0 or 1. If you're just interested in 0/not 0, you can get by without the shift.

Have a look at the answers to this question.

Theory
There is no C syntax for accessing or setting the n-th bit of a built-in datatype (e.g. a 'char'). However, you can access bits using a logical AND operation, and set bits using a logical OR operation.
As an example, say that you have a variable that holds 1101 and you want to check the 2nd bit from the left. Simply perform a logical AND with 0100:
1101
0100
---- AND
0100
If the result is non-zero, then the 2nd bit must have been set; otherwise is was not set.
If you want to set the 3rd bit from the left, then perform a logical OR with 0010:
1101
0010
---- OR
1111
You can use the C operators && (for AND) and || (for OR) to perform these tasks. You will need to construct the bit access patterns (the 0100 and 0010 in the above examples) yourself. The trick is to remember that the least significant bit (LSB) counts 1s, the next LSB counts 2s, then 4s etc. So, the bit access pattern for the n-th LSB (starting at 0) is simply the value of 2^n. The easiest way to compute this in C is to shift the binary value 0001 (in this four bit example) to the left by the required number of places. As this value is always equal to 1 in unsigned integer-like quantities, this is just '1 << n'
Example
unsigned char myVal = 0x65; /* in hex; this is 01100101 in binary. */
/* Q: is the 3-rd least significant bit set (again, the LSB is the 0th bit)? */
unsigned char pattern = 1;
pattern <<= 3; /* Shift pattern left by three places.*/
if(myVal && (char)(1<<3)) {printf("Yes!\n");} /* Perform the test. */
/* Set the most significant bit. */
myVal |= (char)(1<<7);
This example hasn't been tested, but should serve to illustrate the general idea.

To query state of bit with specific index:
int index_state = variable & ( 1 << bit_index );
To set bit:
varabile |= 1 << bit_index;
To restart bit:
variable &= ~( 1 << bit_index );

Try using bitfields. Be careful the implementation can vary by compiler.
http://publications.gbdirect.co.uk/c_book/chapter6/bitfields.html

IF you want to index a bit you could:
bit = (char & 0xF0) >> 7;
gets the msb of a char. You could even leave out the right shift and do a test on 0.
bit = char & 0xF0;
if the bit is set the result will be > 0;
obviousuly, you need to change the mask to get different bits (NB: the 0xF is the bit mask if it is unclear). It is possible to define numerous masks e.g.
#define BIT_0 0x1 // or 1 << 0
#define BIT_1 0x2 // or 1 << 1
#define BIT_2 0x4 // or 1 << 2
#define BIT_3 0x8 // or 1 << 3
etc...
This gives you:
bit = char & BIT_1;
You can use these definitions in the above code to sucessfully index a bit within either a macro or a function.
To set a bit:
char |= BIT_2;
To clear a bit:
char &= ~BIT_3
To toggle a bit
char ^= BIT_4
This help?

Individual bits can be indexed as follows.
Define a struct like this one:
struct
{
unsigned bit0 : 1;
unsigned bit1 : 1;
unsigned bit2 : 1;
unsigned bit3 : 1;
unsigned reserved : 28;
} bitPattern;
Now if I want to know the individual bit values of a var named "value", do the following:
CopyMemory( &input, &value, sizeof(value) );
To see if bit 2 is high or low:
int state = bitPattern.bit2;
Hope this helps.

There is a standard library container for bits: std::vector. It is specialised in the library to be space efficient. There is also a boost dynamic_bitset class.
These will let you perform operations on a set of boolean values, using one bit per value of underlying storage.
Boost dynamic bitset documentation
For the STL documentation, see your compiler documentation.
Of course, you can also address the individual bits in other integral types by hand. If you do that, you should use unsigned types so that you don't get undefined behaviour if decide to do a right shift on a value with the high bit set. However, it sounds like you want the containers.
To the commenter who claimed this takes 32x more space than necessary: boost::dynamic_bitset and vector are specialised to use one bit per entry, and so there is not a space penalty, assuming that you actually want more than the number of bits in a primitive type. These classes allow you to address individual bits in a large container with efficient underlying storage. If you just want (say) 32 bits, by all means, use an int. If you want some large number of bits, you can use a library container.

Related

How to pad or extend the most significant bit (bit 23) into bits 24 through 31

I want to know, how could I extend the most significant bit (bit 23) into bits 24 through 31? How could I do that in C code? I am using C code to program Nios II.
I was thinking of using bit shifting operation but not knowing in details how by using bit shifting operation, the above could be achieved, any link or resource is much appreciated.
Thank you in advance.
As Carl said, right shift if implementation defined. You can use other binary operators that will always work:
if (0 != (0x00800000 & x)) //test if bit 23 is set
{
x |= 0xFF000000; //set bits 24-31
}
else
{
x &= 0x00FFFFFF; //clear bits 24-31
}
The C right-shift operator has implementation-defined behaviour when right-shifting. Since Nios II has an arithmetic right-shift instruction, you can likely simply do:
x = (x << 8) >> 8;
Double check the output assembly to be sure it uses an instruction from the sra family.
A variation on #IronMensan which relies on the reasonable assumption that the integer being modified is 32 bits.
The following only affects bit 24-31, even if the integer is wider.
#define Mask2431 (0xFF000000)
#define Bit23 (0x800000)
some_int |= Mask2431;
if (!(some_int & Bit23))
some_int ^= Mask2431;
The following affects bit 24 and all higher even when using wider than a 32-bit integer:
#define Mask24 (0xFFFFFF)
#define Bit23 (0x800000)
some_int &= Mask24;
if (some_int & Bit23)
some_int = ~some_int ^ Mask24;

C: Most efficient way to set all bits in a range within a variable

Let's take int as an example:
int SetBitWithinRange(const unsigned from, const unsigned to)
{
//To be implemented
}
SetBitWithinRange is supposed to return an intin which all and only the bits starting at bit from to bit to are set, when from is smaller than to and both are in the range of 0 to 32.
e.g.:
int i = SetBitWithinRange(2,4) will result in i having the value of 0b00...01100
Here are some ways. First, some variants of "set n bits, then shift by from". I'll answer in C# though, I'm more familiar with it than I am with C. Should be easy to convert.
uint nbits = 0xFFFFFFFFu >> -(to - from);
return nbits << from;
Downside: can't handle an empty range, ie the case where to <= from.
uint nbits = ~(0xFFFFFFFFu << (to - from));
return nbits << from;
Upside: can handle the case where to = from in which case it will set no bits.
Downside: can't handle the full range, ie setting all bits.
It should be obvious how these work.
Alternatively, you can use the "subtract two powers of two" trick,
(1u << to) - (1u << from)
Downside: to can not be 32, so you can never set the top bit.
Works like this:
01000000
^^^^^^ "to" zeroes
100
^^ "from zeroes"
-------- -
00111100
To the right of the 1 in the "from" part, it's just zeroes being subtracted from zeroes. Then at the 1 in the "from" part, you will either subtract from a 1 (if to == from) and get 0 as a result, or you'll subtract a 1 from a 0 and borrow all the way to the 1 in the to part, which will be reset.
All true bitwise methods that have been proposed at the time of writing have one of those downsides, which raises the question: can it be done without downsides?
The answer is, unfortunately, disappointing. It can be done without downsides, but only by
cheating (ie using non-bitwise elements), or
more operations than would be nice, or
non-standard operations
To give an example of 1, you can just pick any of the previous methods and add a special case (with an if or ternary operator) to work around their downside.
To give an example of 2: (not tested)
uint uppermask = (((uint)to >> 5) ^ 1) << to;
return uppermask - (1u << from);
The uppermask either takes a 1 and shifts it left by to (as usual), or it takes a 0 and shifts it left (by an amount that doesn't matter, since it's 0 that's being shifted), if to == 32. But it's kind of weird and uses more operations.
To give an example of 3, shifts that give zero when you shift by the operand size or more would solve this very easily. Unfortunately, that kind of shift isn't too common.
A common way to do this somewhat efficiently would be this:
uint32_t set_bits_32 (uint32_t data, uint8_t offset, uint8_t n)
{
uint32_t mask = 0xFFFFFFFF >> (32-n);
return data | (mask << offset);
}
I'd go with something like that:
int answer = 0;
unsigned i = from;
for (; i <= to; ++i)
answer |= (1 << i);
return answer;
Easy to implement & readable.
I think that the fastest way would be to pre-calculate all possible values (from (0, 0) to (32, 32), if you know that you'll use this only for 32-bit integers). In fact there are about 1000 of them.
Then you'll end up with O(1) solution:
answer = precalcTable[from][to];
OK, I'm taking up the gauntlet that #JohnZwinck has thrown towards me.
How about:
return (to<32 ? (1<<to) : 0) - (1<<from);
Of course this is without fully checking for validity of from and to.
Edited according to #JosephQuinsey comments.
maybe: (( 1 << to ) - (1 << from)) | (1 << to)
This will also set the to and from bits as requested
Here's my answer. (updated)
unsigned int SetBits(int from, int to)
{
return (UINT_MAX >> (CHAR_BIT*sizeof(int)-to)) & (UINT_MAX << (from-1));
}
SetBits(9,16); ==> 0b 1111 1111 0000 0000
SetBits(1,1); ==> 0b 0000 0001 // Just Bit #1
SetBits(5,5); ==> 0b 0001 0000 // Just Bit #5
SetBits(1,4); ==> 0b 0000 1111 // Bits #1, #2, #3, and #4 (low 4 bits)
SetBits(1,32); ==> 0b 1111 1111 1111 1111 // All Bits
However, SetBits(0,0); does NOT work for turning all bits off.
My assumptions:
Bits are 1-based, starting from the right.
Bytes are 8-bits.
Ints can be any size (16, 32 or 64 bit). sizeof(int) is used.
No checking is done on from or to; caller must pass proper values.
Can be done in this way as well, pow can be implemented using shift operations.
{
unsigned int i =0;
i = pow(2, (to-from))-1;
i = i <<from;
return i;
}

How to create mask with least significat bits set to 1 in C

Can someone please explain this function to me?
A mask with the least significant n bits set to 1.
Ex:
n = 6 --> 0x2F, n = 17 --> 0x1FFFF // I don't get these at all, especially how n = 6 --> 0x2F
Also, what is a mask?
The usual way is to take a 1, and shift it left n bits. That will give you something like: 00100000. Then subtract one from that, which will clear the bit that's set, and set all the less significant bits, so in this case we'd get: 00011111.
A mask is normally used with bitwise operations, especially and. You'd use the mask above to get the 5 least significant bits by themselves, isolated from anything else that might be present. This is especially common when dealing with hardware that will often have a single hardware register containing bits representing a number of entirely separate, unrelated quantities and/or flags.
A mask is a common term for an integer value that is bit-wise ANDed, ORed, XORed, etc with another integer value.
For example, if you want to extract the 8 least significant digits of an int variable, you do variable & 0xFF. 0xFF is a mask.
Likewise if you want to set bits 0 and 8, you do variable | 0x101, where 0x101 is a mask.
Or if you want to invert the same bits, you do variable ^ 0x101, where 0x101 is a mask.
To generate a mask for your case you should exploit the simple mathematical fact that if you add 1 to your mask (the mask having all its least significant bits set to 1 and the rest to 0), you get a value that is a power of 2.
So, if you generate the closest power of 2, then you can subtract 1 from it to get the mask.
Positive powers of 2 are easily generated with the left shift << operator in C.
Hence, 1 << n yields 2n. In binary it's 10...0 with n 0s.
(1 << n) - 1 will produce a mask with n lowest bits set to 1.
Now, you need to watch out for overflows in left shifts. In C (and in C++) you can't legally shift a variable left by as many bit positions as the variable has, so if ints are 32-bit, 1<<32 results in undefined behavior. Signed integer overflows should also be avoided, so you should use unsigned values, e.g. 1u << 31.
For both correctness and performance, the best way to accomplish this has changed since this question was asked back in 2012 due to the advent of BMI instructions in modern x86 processors, specifically BLSMSK.
Here's a good way of approaching this problem, while retaining backwards compatibility with older processors.
This method is correct, whereas the current top answers produce undefined behavior in edge cases.
Clang and GCC, when allowed to optimize using BMI instructions, will condense gen_mask() to just two ops. With supporting hardware, be sure to add compiler flags for BMI instructions:
-mbmi -mbmi2
#include <inttypes.h>
#include <stdio.h>
uint64_t gen_mask(const uint_fast8_t msb) {
const uint64_t src = (uint64_t)1 << msb;
return (src - 1) ^ src;
}
int main() {
uint_fast8_t msb;
for (msb = 0; msb < 64; ++msb) {
printf("%016" PRIx64 "\n", gen_mask(msb));
}
return 0;
}
First, for those who only want the code to create the mask:
uint64_t bits = 6;
uint64_t mask = ((uint64_t)1 << bits) - 1;
# Results in 0b111111 (or 0x03F)
Thanks to #Benni who asked about using bits = 64. If you need the code to support this value as well, you can use:
uint64_t bits = 6;
uint64_t mask = (bits < 64)
? ((uint64_t)1 << bits) - 1
: (uint64_t)0 - 1
For those who want to know what a mask is:
A mask is usually a name for value that we use to manipulate other values using bitwise operations such as AND, OR, XOR, etc.
Short masks are usually represented in binary, where we can explicitly see all the bits that are set to 1.
Longer masks are usually represented in hexadecimal, that is really easy to read once you get a hold of it.
You can read more about bitwise operations in C here.
I believe your first example should be 0x3f.
0x3f is hexadecimal notation for the number 63 which is 111111 in binary, so that last 6 bits (the least significant 6 bits) are set to 1.
The following little C program will calculate the correct mask:
#include <stdarg.h>
#include <stdio.h>
int mask_for_n_bits(int n)
{
int mask = 0;
for (int i = 0; i < n; ++i)
mask |= 1 << i;
return mask;
}
int main (int argc, char const *argv[])
{
printf("6: 0x%x\n17: 0x%x\n", mask_for_n_bits(6), mask_for_n_bits(17));
return 0;
}
0x2F is 0010 1111 in binary - this should be 0x3f, which is 0011 1111 in binary and which has the 6 least-significant bits set.
Similarly, 0x1FFFF is 0001 1111 1111 1111 1111 in binary, which has the 17 least-significant bits set.
A "mask" is a value that is intended to be combined with another value using a bitwise operator like &, | or ^ to individually set, unset, flip or leave unchanged the bits in that other value.
For example, if you combine the mask 0x2F with some value n using the & operator, the result will have zeroes in all but the 6 least significant bits, and those 6 bits will be copied unchanged from the value n.
In the case of an & mask, a binary 0 in the mask means "unconditionally set the result bit to 0" and a 1 means "set the result bit to the input value bit". For an | mask, an 0 in the mask sets the result bit to the input bit and a 1 unconditionally sets the result bit to 1, and for an ^ mask, an 0 sets the result bit to the input bit and a 1 sets the result bit to the complement of the input bit.

Explain this Function

Can someone explain to me the reason why someone would want use bitwise comparison?
example:
int f(int x) {
return x & (x-1);
}
int main(){
printf("F(10) = %d", f(10));
}
This is what I really want to know: "Why check for common set bits"
x is any positive number.
Bitwise operations are used for three reasons:
You can use the least possible space to store information
You can compare/modify an entire register (e.g. 32, 64, or 128 bits depending on your processor) in a single CPU instruction, usually taking a single clock cycle. That means you can do a lot of work (of certain types) blindingly fast compared to regular arithmetic.
It's cool, fun and interesting. Programmers like these things, and they can often be the differentiator when there is no difference between techniques in terms of efficiency/performance.
You can use this for all kinds of very handy things. For example, in my database I can store a lot of true/false information about my customers in a tiny space (a single byte can store 8 different true/false facts) and then use '&' operations to query their status:
Is my customer Male and Single and a Smoker?
if (customerFlags & (maleFlag | singleFlag | smokerFlag) ==
(maleFlag | singleFlag | smokerFlag))
Is my customer (any combination of) Male Or Single Or a Smoker?
if (customerFlags & (maleFlag | singleFlag | smokerFlag) != 0)
Is my customer not Male and not Single and not a Smoker)?
if (customerFlags & (maleFlag | singleFlag | smokerFlag) == 0)
Aside from just "checking for common bits", you can also do:
Certain arithmetic, e.g. value & 15 is a much faster equivalent of value % 16. This only works for certain numbers, but if you can use it, it can be a great optimisation.
Data packing/unpacking. e.g. a colour is often expressed as a 32-bit integer that contains Alpha, Red, Green and Blue byte values. The Red value might be extracted with an expression like red = (value >> 16) & 255; (shift the value down 16 bit positions and then carve off the bottom byte)
Data manipulation and swizzling. Some clever tricks can be achieved with bitwise operations. For example, swapping two integer values without needing to use a third temporary variable, or converting ARGB colour values into another format (e.g RGBA or BGRA)
The Ur-example is "testing if a number is even or odd":
unsigned int number = ...;
bool isOdd = (0 != (number & 1));
More complex uses include bitmasks (multiple boolean values in a single integer, each one taking up one bit of space) and encryption/hashing (which frequently involve bit shifting, XOR, etc.)
The example you've given is kinda odd, but I'll use bitwise comparisons all the time in embedded code.
I'll often have code that looks like the following:
volatile uint32_t *flags = 0x000A000;
bool flagA = *flags & 0x1;
bool flagB = *flags & 0x2;
bool flagC = *flags & 0x4;
It's not a bitwise comparison. It doesn't return a boolean.
Bitwise operators are used to read and modify individual bits of a number.
n & 0x8 // Peek at bit3
n |= 0x8 // Set bit3
n &= ~0x8 // Clear bit3
n ^= 0x8 // Toggle bit3
Bits are used in order to save space. 8 chars takes a lot more memory than 8 bits in a char.
The following example gets the range of an IP subnet using given an IP address of the subnet and the subnet mask of the subnet.
uint32_t mask = (((255 << 8) | 255) << 8) | 255) << 8) | 255;
uint32_t ip = (((192 << 8) | 168) << 8) | 3) << 8) | 4;
uint32_t first = ip & mask;
uint32_t last = ip | ~mask;
e.g. if you have a number of status flags in order to save space you may want to put each flag as a bit.
so x, if declared as a byte, would have 8 flags.
I think you mean bitwise combination (in your case a bitwise AND operation). This is a very common operation in those cases where the byte, word or dword value is handled as a collection of bits, eg status information, eg in SCADA or control programs.
Your example tests whether x has at most 1 bit set. f returns 0 if x is a power of 2 and non-zero if it is not.
Your particular example tests if two consecutive bits in the binary representation are 1.

Explanation of an algorithm to set, clear and test a single bit

Hey, in the Programming Pearls book, there is a source code for setting, clearing and testing a bit of the given index in an array of ints that is actually a set representation.
The code is the following:
#include<stdio.h>
#define BITSPERWORD 32
#define SHIFT 5
#define MASK 0x1F
#define N 10000000
int a[1+ N/BITSPERWORD];
void set(int i)
{
a[i>>SHIFT] |= (1<<(i & MASK));
}
void clr(int i)
{
a[i>>SHIFT] &= ~(1<<(i & MASK));
}
int test(int i)
{
a[i>>SHIFT] & (1<<(i & MASK));
}
Could somebody explain me the reason of the SHIFT and the MASK defines? And what are their purposes in the code?
I've already read the previous related question.
VonC posted a good answer about bitmasks in general. Here's some information that's more specific to the code you posted.
Given an integer representing a bit, we work out which member of the array holds that bit. That is: Bits 0 to 31 live in a[0], bits 32 to 63 live in a[1], etc. All that i>>SHIFT does is i / 32. This works out which member of a the bit lives in. With an optimising compiler, these are probably equivalent.
Obviously, now we've found out which member of a that bitflag lives in, we need to ensure that we set the correct bit in that integer. This is what 1 << i does. However, we need to ensure that we don't try to access the 33rd bit in a 32-bit integer, so the shift operation is constrained by using 1 << (i & 0x1F). The magic here is that 0x1F is 31, so we'll never left-shift the bit represented by i more than 31 places (otherwise it should have gone in the next member of a).
From Here (General answer to get this thread started)
A bit mask is a value (which may be stored in a variable) that enables you to isolate a specific set of bits within an integer type.
Normally the masked will have the bits you are interested in set to 1 and all the other bits set to 0. The mask then allows you to isolate the value of the bits, clear all the bits or set all the bits or set a new value to the bits.
Masks (particularly multi-bit ones) often have an associated shift value which is the amount the bits need shifting left so that the least significant masked bit is shifted to the least significant bit in the type.
For example using a 16 bit short data type suppose you wanted to be able to mask bits 3, 4 and 5 (LSB is number 0). You mask and shift would look something like
#define MASK 0x0038
#define SHIFT 3
Masks are often assigned in hexadecimal because it is easier to work with bits in the data type in that base as opposed to decimal. Historically octal has also been used for bit masks.
If I have a variable, var, that contains data that the mask is relevant to then I can isolate the bits like this
var & MASK
I can isolate all the other bits like this
var & ~MASK
I can clear the bits like this
var &= ~MASK;
I can clear all the other bits like this
var &= MASK;
I can set all the bits like this
var |= MASK;
I can set all the other bits like this
var |= ~MASK;
I can extract the decimal value of the bits like this
(var & MASK) >> SHIFT
I can assign a new value to the bits like this
var &= ~MASK;
var |= (newValue << SHIFT) & MASK;
When You want to set a bit inside the array, You have to
seek to the right array index and
set the appropriate bit inside this array item.
There are BITSPERWORD (=32) bits in one array item, which means that the index i has to be split into two parts:
rightmost 5 bits serve as an index in the array item and
the rest of the bits (leftmost 28) serve as an index into the array.
You get:
the leftmost 28 bits by discarding the rightmost five, which is exactly what i>>SHIFT does, and
the rightmost five bits by masking out anything but the rightmost five bits, which is what i & MASK does.
I guess You understand the rest.
Bitwise operation and the leading paragraphs of Mask are a concise explanation, and contain some pointers for further study.
Think of an 8-bit byte as a set of elements from an 8-member universe. A member is IN the set when the corresponding bit is set. Setting a bit more then once doesn't modify set membership (a bit can have only 2 states). The bitwise operators in C provide access to bits by masking and shifting.
The code is trying to store N bits by an array, where each element of the array contains BITSPERWORD (32) bits.
Thus if you're trying to access bit i, you need to calculate the index of the array element stores it (i/32), which is what i>>SHIFT does.
And then you need to access that bit in the array element we just got.
(i & MASK) gives the bit position at the array element (word).
(1<<(i & MASK)) makes the bit at that position to be set.
Now you can set/clear/test that bit in a[i>>SHIFT] by (1<<i & MASK)).
You may also think i is a 32 bits number, that bits 6~31 is the index of the array element stores it, bits 0~5 represents the bit position in the word.

Resources