Can I allocate a specific number of bits in C? - c

I am trying to store a large amount of boolean information that is determined at run-time. I was wondering what the best method might be.
I have currently been trying to allocate the memory using:
pStatus = malloc((<number of data points>/8) + 1);
thinking that this will give me enough bits to work with. I could then reference each boolean value using the pointer in array notation:
pStatus[element]
Unfortunately this does not seem to be working very well. First, I am having difficulty initializing the memory to the integer value 0. Can this be done using memset()? Still, I don't think that is impacting why I crash when trying to access pStatus[element].
I am also not entirely convinced that this approach is the best one to be using. What I really want is essentially a giant bitmask that reflects the status of the boolean values. Have I missed something?

pStatus = malloc((<number of data points>/8) + 1);
This does allocate enough bytes for your bits. However,
pStatus[element]
This accesses the element'th byte, not bit. So when element is more than one-eighth of the total number of bits, you're accessing off the end of the array allocated.
I would define a few helper functions
int get_bit(int element)
{
uint byte_index = element/8;
uint bit_index = element % 8;
uint bit_mask = ( 1 << bit_index);
return ((pStatus[byte_index] & bit_mask) != 0);
}
void set_bit (int element)
{
uint byte_index = element/8;
uint bit_index = element % 8;
uint bit_mask = ( 1 << bit_index);
pStatus[byte_index] |= bit_mask);
}
void clear_bit (int element)
{
uint byte_index = element/8;
uint bit_index = element % 8;
uint bit_mask = ( 1 << bit_index);
pStatus[byte_index] &= ~bit_mask;
}
(error checking on range of element left out for clarity. You could make this macros, too)

...thinking that this will give me enough bits to work with. I could then reference each boolean value using the pointer in array notation:
pStatus[element]
element is addressing bytes, not bits. You want something like:
pStatus[element/8] & (1 << (element % 8))

Small point: to get enough memory to store N bits, (N/8) + 1 bytes is imprecise (can be one too many).
(N+7)/8 is always the minimum number, though.

Well, the simplest answer would be to use calloc instead of malloc.
It is defined to initialize the memory it allocates to zero, and can often do it by using page mapping tricks.
That will take care of your memory initialization problem. The other dozen posts here seem to adequately address the indexing problem and the fact that you occasionally allocate an extra byte (oh the horror!), so I won't repeat their content here.

pStatus[element] will give you an entire byte at that address.
To set a particular element you would do something like:
pStatus[element >> 3] |= 1 << (element & 7);
To reset an element:
pStatus[element >> 3] &= ~1 << (element & 7);
and to test an element:
if (pStatus[element >> 3] & (1 << (element & 7)) != 0)
the initial allocation should be
pstatus = malloc((<number of data points> + 7) / 8)
what you had will work but wastes a byte occasionally

I can't help but notice that all replies in C here seem to assume that a byte is 8 bits. This is not necessarily true in C (although it will of course be true on most mainstream hardware), so making this assumption in code is rather bad form.
The proper way to write architecture-neutral code is to
#include <limits.h>
and then use the CHAR_BIT macro wherever you need "the number of bits in a char".

Make yourself happier and define a type and functions to operate on that type. That way if you discover that bit accesses are too slow, you can change the unit of memory per boolean to a byte/word/long or adopt sparse/dynamic data structures if memory is really an issue (ie, if your sets are mostly zeros, you could just keep a list with the coordinates of the 1's.
You can write your code to be completely immune to changes to the implementation of your bit vector.

pStatus[element] does not address the bit. The exact byte it gets is dependent on the type of pStatus -- I assume char* or equivalent -- so pStatus[element] gets you the element'th byte.
You could memset to set to 0, yes.

pStatus = malloc((<number of data points>/8) + 1);
That part's fine.
pStatus[element]
here's where you have trouble. You are address bytes, when you want to address bits.
pStatus[element / 8 ]
will get you the right byte in the array.

You need to allocate c = malloc((N+7)/8) bytes, and you can set the nth with
c[n/8]=((c[n/8] & ~(0x80 >> (n%8))) | (0x80>>(n%8)));
clear with
c[n/8] &= ~(0x80 >> (n%8));
and test with
if(c[n/8] & (0x80 >> (n%8))) blah();

If you don't mind having to write wrappers, you could also use either bit_set or bit_vector from C++'s STL, seems like they (especially the latter) have exactly what you need, already coded, tested and packaged (and plenty of bells and whistles).
It's a real shame we lack a straight forward way to use C++ code in C applications (no, creating a wrapper isn't straight-forward to me, nor fun, and means more work in the long term).

What would be wrong with std::vector<bool>?

It amazes me that only one answer here mentions CHAR_BIT. A byte is often 8 bits, but not always.

You allocation code is correct, see the set_bit() and get_bit() functions given in this answer to access the boolean.

If you are limited to just a few bits you can instead of eaanon01 solution also use the c builtin facility of bitfield (there are very few occasion where you could use them, but this would be one)
For this bit banging stuff I can recommendate:
Herny Warrens "Hacker Delight"

The boolean is "never" a separate value in C. So a struct might be in order to get you going.
It is true that you do not initialize the mem area so you need to do that individually.
Here is a simple example of how you could do it with unions structs and enums
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long int DWORD;
typedef unsigned long long int DDWORD;
enum STATUS
{
status0 = 0x01,
status1 = 0x02,
status2 = 0x04,
status3 = 0x08,
status4 = 0x10,
status5 = 0x20,
status6 = 0x40,
status7 = 0x80,
status_group = status0 + status1 +status4
};
#define GET_STATUS( S ) ( ((status.DDBuf&(DDWORD)S)==(DDWORD)S) ? 1 : 0 )
#define SET_STATUS( S ) ( (status.DDBuf|= (DDWORD)S) )
#define CLR_STATUS( S ) ( (status.DDBuf&= ~(DDWORD)S) )
static union {
BYTE BBuf[8];
WORD WWBuf[4];
DWORD DWBuf[2];
DDWORD DDBuf;
}status;
int main(void)
{
// Reset status bits
status.BBuf[0] = 0;
printf( "%d \n", GET_STATUS( status0 ) );
SET_STATUS( status0 );
printf( "%d \n", GET_STATUS( status0 ) );
CLR_STATUS(status0);
printf( "%d \n", GET_STATUS( status0 ) );
SET_STATUS( status_group );
printf( "%d \n", GET_STATUS( status0 ) );
system( "pause" );
return 0;
}
Hope this helps. This example can handle up until 64 status booleans and could be easy extended.
This exapmle is based on Char = 8 bits int = 16 bits long int = 32 bits and long long int = 64 bits
I have now also added support for status groups.

Related

Bitwise operation in C language (0x80, 0xFF, << )

I have a problem understanding this code. What I know is that we have passed a code into a assembler that has converted code into "byte code". Now I have a Virtual machine that is supposed to read this code. This function is supposed to read the first byte code instruction. I don't understand what is happening in this code. I guess we are trying to read this byte code but don't understand how it is done.
static int32_t bytecode_to_int32(const uint8_t *bytecode, size_t size)
{
int32_t result;
t_bool sign;
int i;
result = 0;
sign = (t_bool)(bytecode[0] & 0x80);
i = 0;
while (size)
{
if (sign)
result += ((bytecode[size - 1] ^ 0xFF) << (i++ * 8));
else
result += bytecode[size - 1] << (i++ * 8);
size--;
}
if (sign)
result = ~(result);
return (result);
}
This code is somewhat badly written, lots of operations on a single line and therefore containing various potential bugs. It looks brittle.
bytecode[0] & 0x80 Simply reads the MSB sign bit, assuming it's 2's complement or similar, then converts it to a boolean.
The loop iterates backwards from most significant byte to least significant.
If the sign was negative, the code will perform an XOR of the data byte with 0xFF. Basically inverting all bits in the data. The result of the XOR is an int.
The data byte (or the result of the above XOR) is then bit shifted i * 8 bits to the left. The data is always implicitly promoted to int, so in case i * 8 happens to give a result larger than INT_MAX, there's a fat undefined behavior bug here. It would be much safer practice to cast to uint32_t before the shift, carry out the shift, then convert to a signed type afterwards.
The resulting int is converted to int32_t - these could be the same type or different types depending on system.
i is incremented by 1, size is decremented by 1.
If sign was negative, the int32_t is inverted to some 2's complement negative number that's sign extended and all the data bits are inverted once more. Except all zeros that got shifted in with the left shift are also replaced by ones. If this is intentional or not, I cannot tell. So for example if you started with something like 0x0081 you now have something like 0xFFFF01FF. How that format makes sense, I have no idea.
My take is that the bytecode[size - 1] ^ 0xFF (which is equivalent to ~) was made to toggle the data bits, so that they would later toggle back to their original values when ~ is called later. A programmer has to document such tricks with comments, if they are anything close to competent.
Anyway, don't use this code. If the intention was merely to swap the byte order (endianess) of a 4 byte integer, then this code must be rewritten from scratch.
That's properly done as:
static int32_t big32_to_little32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
Anything more complicated than the above is highly questionable code. We need not worry about signs being a special case, the above code preserves the original signedness format.
So the A^0xFF toggles the bits set in A, so if you have 10101100 xored with 11111111.. it will become 01010011. I am not sure why they didn't use ~ here. The ^ is a xor operator, so you are xoring with 0xFF.
The << is a bitshift "up" or left. In other words, A<<1 is equivalent to multiplying A by 2.
the >> moves down so is equivalent to bitshifting right, or dividing by 2.
The ~ inverts the bits in a byte.
Note it's better to initialise variables at declaration it costs no additional processing whatsoever to do it that way.
sign = (t_bool)(bytecode[0] & 0x80); the sign in the number is stored in the 8th bit (or position 7 counting from 0), which is where the 0x80 is coming from. So it's literally checking if the signed bit is set in the first byte of bytecode, and if so then it stores it in the sign variable.
Essentially if it's unsigned then it's copying the bytes from from bytecode into result one byte at a time.
If the data is signed then it flips the bits then copies the bytes, then when it's done copying, it flips the bits back.
Personally with this kind of thing i prefer to get the data, stick in htons() format (network byte order) and then memcpy it to an allocated array, store it in a endian agnostic way, then when i retrieve the data i use ntohs() to convert it back to the format used by the computer. htons() and ntohs() are standard C functions and are used in networking and platform agnostic data formatting / storage / communication all the time.
This function is a very naive version of the function which converts form the big endian to little endian.
The parameter size is not needed as it works only with the 4 bytes data.
It can be much easier archived by the union punning (and it allows compilers to optimize it - in this case to the simple instruction):
#define SWAP(a,b,t) do{t c = (a); (a) = (b); (b) = c;}while(0)
int32_t my_bytecode_to_int32(const uint8_t *bytecode)
{
union
{
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.b8[3] = *bytecode++;
i32.b8[2] = *bytecode++;
i32.b8[1] = *bytecode++;
i32.b8[0] = *bytecode++;
return i32.i32;
}
int main()
{
union {
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.i32 = -4567;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", bytecode_to_int32(i32.b8, 4));
i32.i32 = -34;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", my_bytecode_to_int32(i32.b8));
}
https://godbolt.org/z/rb6Na5
If the purpose of the code is to sign-extend a 1-, 2-, 3-, or 4-byte sequence in network/big-endian byte order to a signed 32-bit int value, it's doing things the hard way and reimplementing the wheel along the way.
This can be broken down into a three-step process: convert the proper number of bytes to a 32-bit integer value, sign-extend bytes out to 32 bits, then convert that 32-bit value from big-endian to the host's byte order.
The "wheel" being reimplemented in this case is the the POSIX-standard ntohl() function that converts a 32-bit unsigned integer value in big-endian/network byte order to the local host's native byte order.
The first step I'd do is to convert 1, 2, 3, or 4 bytes into a uint32_t:
#include <stdint.h>
#include <limits.h>
#include <arpa/inet.h>
#include <errno.h>
// convert the `size` number of bytes starting at the `bytecode` address
// to a uint32_t value
static uint32_t bytecode_to_uint32( const uint8_t *bytecode, size_t size )
{
uint32_t result = 0;
switch ( size )
{
case 4:
result = bytecode[ 0 ] << 24;
case 3:
result += bytecode[ 1 ] << 16;
case 2:
result += bytecode[ 2 ] << 8;
case 1:
result += bytecode[ 3 ];
break;
default:
// error handling here
break;
}
return( result );
}
Then, sign-extend it (borrowing from this answer):
static uint32_t sign_extend_uint32( uint32_t in, size_t size );
{
if ( size == 4 )
{
return( in );
}
// being pedantic here - the existence of `[u]int32_t` pretty
// much ensures 8 bits/byte
size_t bits = size * CHAR_BIT;
uint32_t m = 1U << ( bits - 1 );
uint32_t result = ( in ^ m ) - m;
return ( result );
}
Put it all together:
static int32_t bytecode_to_int32( const uint8_t *bytecode, size_t size )
{
uint32_t result = bytecode_to_uint32( bytecode, size );
result = sign_extend_uint32( result, size );
// set endianness from network/big-endian to
// whatever this host's endianness is
result = ntohl( result );
// converting uint32_t here to signed int32_t
// can be subject to implementation-defined
// behavior
return( result );
}
Note that the conversion from uint32_t to int32_t implicitly performed by the return statement in the above code can result in implemenation-defined behavior as there can be uint32_t values that can not be mapped to int32_t values. See this answer.
Any decent compiler should optimize that well into inline functions.
I personally think this also needs much better error handling/input validation.

Writing specific bits to a binary header

Suppose I want to create binary of a specific length, using the first 4 bits to define the type (Which should allow for 16 different types) and the last 60 bits to define content.
How would one go ahead and construct this in C? I'm having a hard time finding any examples (That properly explains it) of doing this in C (I haven't worked with C this low-level before and I'm trying to get my feet wet...)
Would I just create a char[8] and manually set each bit with something like
/** Set bit in any sized bit block.
*
* #return none
*
* #param bit - Bit number.
* #param bitmap - Pointer to bitmap.
*
* #note Please note that this function does not know the size of the
* bitmap and it cannot range check the specified bit number.
*/
void SetBit(int bit, unsigned char *bitmap)
{
int n, x;
x = bit / 8; // Index to byte.
n = bit % 8; // Specific bit in byte.
bitmap[x] |= (1 << n); // Set bit.
}
Above code is from storing a bit in a bit of character array in C linux
I would create a function specific to the task and just use a mask.
void setType(uint8_t type, uint8_t* header)
{
header[0] = (header[0] & 0x0f) | (type << 4);
}
// To use:
uint8_t header[8];
setType(3, header);
I would create a similar function to set each field of the header.
The above assumes that by "first four bits" you mean the most significant bits of the first byte of the header rather than the least significant bits of the first byte of the header.
You could have something like this function that will help you set a specific nibble (Nibble is 4bits of data.1 byte(8bit) is 2 Nibbles , probably that was what confused you).This means that you just need to pass your char[x] byte specifying either you need to change left or right hand side part of the byte:
int set_nibble(unsigned char* dest,unsigned char src,nibble_side side)
{
if (side == left_hand )
{
*dest = ((*dest & 0x0f) | (src<<4));
return 0;
}
if (side == right_hand )
{
*dest = ((*dest & 0xf0) | (src));
return 0;
}
return -1;
}
where nibble_side param is something like
typedef enum nibble_side_t
{
right_hand, left_hand
} nibble_side;
Here and here two decent guides for Binary AND operations.You must feel comfortable to use it for filtering the data you need , before you do operations like this.

Copy 6 byte array to long long integer variable

I have read from memory a 6 byte unsigned char array.
The endianess is Big Endian here.
Now I want to assign the value that is stored in the array to an integer variable. I assume this has to be long long since it must contain up to 6 bytes.
At the moment I am assigning it this way:
unsigned char aFoo[6];
long long nBar;
// read values to aFoo[]...
// aFoo[0]: 0x00
// aFoo[1]: 0x00
// aFoo[2]: 0x00
// aFoo[3]: 0x00
// aFoo[4]: 0x26
// aFoo[5]: 0x8e
nBar = (aFoo[0] << 64) + (aFoo[1] << 32) +(aFoo[2] << 24) + (aFoo[3] << 16) + (aFoo[4] << 8) + (aFoo[5]);
A memcpy approach would be neat, but when I do this
memcpy(&nBar, &aFoo, 6);
the 6 bytes are being copied to the long long from the start and thus have padding zeros at the end.
Is there a better way than my assignment with the shifting?
What you want to accomplish is called de-serialisation or de-marshalling.
For values that wide, using a loop is a good idea, unless you really need the max. speed and your compiler does not vectorise loops:
uint8_t array[6];
...
uint64_t value = 0;
uint8_t *p = array;
for ( int i = (sizeof(array) - 1) * 8 ; i >= 0 ; i -= 8 )
value |= (uint64_t)*p++ << i;
// left-align
value <<= 64 - (sizeof(array) * 8);
Note using stdint.h types and sizeof(uint8_t) cannot differ from1`. Only these are guaranteed to have the expected bit-widths. Also use unsigned integers when shifting values. Right shifting certain values is implementation defined, while left shifting invokes undefined behaviour.
Iff you need a signed value, just
int64_t final_value = (int64_t)value;
after the shifting. This is still implementation defined, but all modern implementations (and likely the older) just copy the value without modifications. A modern compiler likely will optimize this, so there is no penalty.
The declarations can be moved, of course. I just put them before where they are used for completeness.
You might try
nBar = 0;
memcpy((unsigned char*)&nBar + 2, aFoo, 6);
No & needed before an array name caz' it's already an address.
The correct way to do what you need is to use an union:
#include <stdio.h>
typedef union {
struct {
char padding[2];
char aFoo[6];
} chars;
long long nBar;
} Combined;
int main ()
{
Combined x;
// reset the content of "x"
x.nBar = 0; // or memset(&x, 0, sizeof(x));
// put values directly in x.chars.aFoo[]...
x.chars.aFoo[0] = 0x00;
x.chars.aFoo[1] = 0x00;
x.chars.aFoo[2] = 0x00;
x.chars.aFoo[3] = 0x00;
x.chars.aFoo[4] = 0x26;
x.chars.aFoo[5] = 0x8e;
printf("nBar: %llx\n", x.nBar);
return 0;
}
The advantage: the code is more clear, there is no need to juggle with bits, shifts, masks etc.
However, you have to be aware that, for speed optimization and hardware reasons, the compiler might squeeze padding bytes into the struct, leading to aFoo not sharing the desired bytes of nBar. This minor disadvantage can be solved by telling the computer to align the members of the union at byte-boundaries (as opposed to the default which is the alignment at word-boundaries, the word being 32-bit or 64-bit, depending on the hardware architecture).
This used to be achieved using a #pragma directive and its exact syntax depends on the compiler you use.
Since C11/C++11, the alignas() specifier became the standard way to specify the alignment of struct/union members (given your compiler already supports it).

How to define and work with an array of bits in C?

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).
Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.
I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.
This is basically what the code looks like in 1D, simplified:
int main()
{
/* Define lattice */
array = (char*)malloc(N * sizeof(char));
total_c = 0;
/* Carry out RSA multiple times */
for (i = 0; i < 1000; i++)
rand_seq_ads();
/* Calculate average coverage efficiency at jamming */
printf("coverage efficiency = %lf", total_c/1000);
return 0;
}
void rand_seq_ads()
{
/* Initialise array, initial conditions */
memset(a, 0, N * sizeof(char));
available_sites = N;
count = 0;
/* While the lattice still has enough room... */
while(available_sites != 0)
{
/* Generate random site location */
x = rand();
/* Deposit dimer (if site is available) */
if(array[x] == 0)
{
array[x] = 1;
array[x+1] = 1;
count += 1;
available_sites += -2;
}
/* Mark site left of dimer as unavailable (if its empty) */
if(array[x-1] == 0)
{
array[x-1] = 1;
available_sites += -1;
}
}
/* Calculate coverage %, and add to total */
c = count/N
total_c += c;
}
For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).
I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?
Thanks for your answers
If I am not too late, this page gives awesome explanation with examples.
An array of int can be used to deal with array of bits. Assuming size of int to be 4 bytes, when we talk about an int, we are dealing with 32 bits. Say we have int A[10], means we are working on 10*4*8 = 320 bits and following figure shows it: (each element of array has 4 big blocks, each of which represent a byte and each of the smaller blocks represent a bit)
So, to set the kth bit in array A:
// NOTE: if using "uint8_t A[]" instead of "int A[]" then divide by 8, not 32
void SetBit( int A[], int k )
{
int i = k/32; //gives the corresponding index in the array A
int pos = k%32; //gives the corresponding bit position in A[i]
unsigned int flag = 1; // flag = 0000.....00001
flag = flag << pos; // flag = 0000...010...000 (shifted k positions)
A[i] = A[i] | flag; // Set the bit at the k-th position in A[i]
}
or in the shortened version
void SetBit( int A[], int k )
{
A[k/32] |= 1 << (k%32); // Set the bit at the k-th position in A[i]
}
similarly to clear kth bit:
void ClearBit( int A[], int k )
{
A[k/32] &= ~(1 << (k%32));
}
and to test if the kth bit:
int TestBit( int A[], int k )
{
return ( (A[k/32] & (1 << (k%32) )) != 0 ) ;
}
As said above, these manipulations can be written as macros too:
// Due order of operation wrap 'k' in parentheses in case it
// is passed as an equation, e.g. i + 1, otherwise the first
// part evaluates to "A[i + (1/32)]" not "A[(i + 1)/32]"
#define SetBit(A,k) ( A[(k)/32] |= (1 << ((k)%32)) )
#define ClearBit(A,k) ( A[(k)/32] &= ~(1 << ((k)%32)) )
#define TestBit(A,k) ( A[(k)/32] & (1 << ((k)%32)) )
typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up
Now, each long in a bfield_t can hold sizeof(long)*8 bits.
You can calculate the index of a needed big by:
bindex = index / (8 * sizeof(long) );
and your bit number by
b = index % (8 * sizeof(long) );
You can then look up the long you need and then mask out the bit you need from it.
result = my_field[bindex] & (1<<b);
or
result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0
The first one may be faster on some cpus or may save you shifting back up of you need
to perform operations between the same bit in multiple bit arrays. It also mirrors
the setting and clearing of a bit in the field more closely than the second implemention.
set:
my_field[bindex] |= 1<<b;
clear:
my_field[bindex] &= ~(1<<b);
You should remember that you can use bitwise operations on the longs that hold the fields
and that's the same as the operations on the individual bits.
You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in strings.h. It's there just for this purpose -- a string of bits.
Anyway, it is find first set and essentially:
int ffs(int x) {
int c = 0;
while (!(x&1) ) {
c++;
x>>=1;
}
return c; // except that it handles x = 0 differently
}
This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.
You can use & (bitwise and) and << (left shift).
For example, (1 << 3) results in "00001000" in binary. So your code could look like:
char eightBits = 0;
//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000".
Then just scale it up with an array of chars and figure out the appropriate byte to modify first.
For more efficiency, you could define a list of bitfields in advance and put them in an array:
#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80
char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};
Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:
eightBits &= (bits[3] & bits[4]);
Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.
bitarray.h:
#include <inttypes.h> // defines uint32_t
//typedef unsigned int bitarray_t; // if you know that int is 32 bits
typedef uint32_t bitarray_t;
#define RESERVE_BITS(n) (((n)+0x1f)>>5)
#define DW_INDEX(x) ((x)>>5)
#define BIT_INDEX(x) ((x)&0x1f)
#define getbit(array,index) (((array)[DW_INDEX(index)]>>BIT_INDEX(index))&1)
#define putbit(array, index, bit) \
((bit)&1 ? ((array)[DW_INDEX(index)] |= 1<<BIT_INDEX(index)) \
: ((array)[DW_INDEX(index)] &= ~(1<<BIT_INDEX(index))) \
, 0 \
)
Use:
bitarray_t arr[RESERVE_BITS(130)] = {0, 0x12345678,0xabcdef0,0xffff0000,0};
int i = getbit(arr,5);
putbit(arr,6,1);
int x=2; // the least significant bit is 0
putbit(arr,6,x); // sets bit 6 to 0 because 2&1 is 0
putbit(arr,6,!!x); // sets bit 6 to 1 because !!2 is 1
EDIT the docs:
"dword" = "double word" = 32-bit value (unsigned, but that's not really important)
RESERVE_BITS: number_of_bits --> number_of_dwords
RESERVE_BITS(n) is the number of 32-bit integers enough to store n bits
DW_INDEX: bit_index_in_array --> dword_index_in_array
DW_INDEX(i) is the index of dword where the i-th bit is stored.
Both bit and dword indexes start from 0.
BIT_INDEX: bit_index_in_array --> bit_index_in_dword
If i is the number of some bit in the array, BIT_INDEX(i) is the number
of that bit in the dword where the bit is stored.
And the dword is known via DW_INDEX().
getbit: bit_array, bit_index_in_array --> bit_value
putbit: bit_array, bit_index_in_array, bit_value --> 0
getbit(array,i) fetches the dword containing the bit i and shifts the dword right, so that the bit i becomes the least significant bit. Then, a bitwise and with 1 clears all other bits.
putbit(array, i, v) first of all checks the least significant bit of v; if it is 0, we have to clear the bit, and if it is 1, we have to set it.
To set the bit, we do a bitwise or of the dword that contains the bit and the value of 1 shifted left by bit_index_in_dword: that bit is set, and other bits do not change.
To clear the bit, we do a bitwise and of the dword that contains the bit and the bitwise complement of 1 shifted left by bit_index_in_dword: that value has all bits set to one except the only zero bit in the position that we want to clear.
The macro ends with , 0 because otherwise it would return the value of dword where the bit i is stored, and that value is not meaningful. One could also use ((void)0).
It's a trade-off:
(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory
(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory
If you have enough memory available then go for (1), otherwise consider (2).

What is the fastest way(s) to loop through a large data chunk on a per-bit basis

I am running through a memory block of binary data byte-wise.
Currently I am doing something like this:
for (i = 0; i < data->Count; i++)
{
byte = &data->Data[i];
((*byte & Masks[0]) == Masks[0]) ? Stats.FreqOf1++; // syntax incorrect but you get the point.
((*byte & Masks[1]) == Masks[1]) ? Stats.FreqOf1++;
((*byte & Masks[2]) == Masks[2]) ? Stats.FreqOf1++;
((*byte & Masks[3]) == Masks[3]) ? Stats.FreqOf1++;
((*byte & Masks[4]) == Masks[4]) ? Stats.FreqOf1++;
((*byte & Masks[5]) == Masks[5]) ? Stats.FreqOf1++;
((*byte & Masks[6]) == Masks[6]) ? Stats.FreqOf1++;
((*byte & Masks[7]) == Masks[7]) ? Stats.FreqOf1++;
}
Where Masks is:
for (i = 0; i < 8; i++)
{
Masks[i] = 1 << i;
}
(I somehow did not manage to do it as fast in a loop or in an inlined function, so I wrote it out.)
Does anyone have any suggestions on how to to improve this first loop? I am rather inexperienced with getting down to bits.
This may seem like a stupid thing to do. But I am in the process of implementing a compression algorithm. I just want to have the bit accessing part down right.
Thanks!
PS: This is in on the Visual Studio 2008 compiler. So it would be nice if the suggestions applied to that compiler.
PPS: I just realized, that I don't need to increment two counts. One would be enough. Then compute the difference to the total bits at the end.
But that would be specific to just counting. What I really want done fast is the bit extraction.
EDIT:
The lookup table idea that was brought forward is nice.
I realize though that I posed the question wrong in the title.
Because in the end what I want to do is not count the bits, but access each bit as fast as possible.
ANOTHER EDIT:
Is it possible to advance a pointer by just one bit in the data?
ANOTHER EDIT:
Thank you for all your answers so far.
What I want to implement in the next steps is a nonsophisticated binary arithmetic coder that does not analyze the context. So I am only interested in single bits for now. Eventually it will become a Context-adaptive BAC but I will leave that for later.
Processing 4 bytes instead of 1 byte could be an option. But a loop over 32 bits is costly as well, isn't it?
The fastest way is probably to build a lookup table of byte values versus the number of bits set in that byte. At least that was the answer when I interviewed at Google.
Use a table that maps each byte value (256) to the number of 1's in it. (The # of 0's is just (8 - # of 1's)). Then iterate over the bytes and perform a single lookup for each byte, instead of multiple lookups and comparisons. For example:
int onesCount = 0;
for (i = 0; i < data->Count; i++)
{
byte = &data->Data[i];
onesCount += NumOnes[byte];
}
Stats.FreqOf1 += onesCount;
Stats.FreqOf0 += (data->Count * 8) - onesCount;
I did not really understand what you're trying to do. But if you just want to get access to the bits of a bitmap, you can use these (untested!!!) functions:
#include <stddef.h>
_Bool isbitset(unsigned char * bitmap, size_t idx)
{
return bitmap[idx / 8] & (1 << (idx % 8)) ? 1 : 0;
}
void setbit(unsigned char * bitmap, size_t idx)
{
bitmap[idx / 8] |= (1 << (idx % 8));
}
void unsetbit(unsigned char * bitmap, size_t idx)
{
bitmap[idx / 8] &= ~(1 << (idx % 8));
}
void togglebit(unsigned char * bitmap, size_t idx)
{
bitmap[idx / 8] ^= (1 << (idx % 8));
}
Edit: Ok, I think I understand what you want to do: Fast iteration over a sequence of bits. Therefore, we don't want to use the random access functions from above, but read a whole word of data at once.
You might use any unsigned integer type you like, but you should choose one which is likely to correspond to the word size of your architecture. I'll go with uint_fast32_t from stdint.h:
uint_fast32_t * data = __data_source__;
for(; __condition__; ++data)
{
uint_fast32_t mask = 1;
uint_fast32_t current = *data;
for(; mask; mask <<= 1)
{
if(current & mask)
{
// bit is set
}
else
{
// bit is not set
}
}
}
From the inner loop, you can set the bit with
*data |= mask;
unset the bit with
*data &= ~mask;
and toggle the bit with
*data ^= mask;
Warning: The code might behave unexpectedly on big-endian architectures!
You could use a precomputed lookup table, i.e:
static int bitcount_lookup[256] = { ..... } ; /* or make it a global and compute the values in code */
...
for( ... )
byte = ...
Stats.FreqOf1 += bitcount_lookup[byte];
Here is a method how to count the 1 bits of a 32bit integer (based on Java's Integer.bitCount(i) method):
unsigned bitCount(unsigned i) {
i = i - ((i >> 1) & 0x55555555);
i = (i & 0x33333333) + ((i >> 2) & 0x33333333);
i = (i + (i >> 4)) & 0x0f0f0f0f;
i = i + (i >> 8);
i = i + (i >> 16);
return i & 0x3f;
}
So you can cast your data to int and move forward in 4 byte steps.
Here is a simple one I whipped up on just a single 32 bit value, but you can see it wouldn't be hard to adapt it to any number of bits....
int ones = 0;
int x = 0xdeadbeef;
for(int y = 0;y < 32;y++)
{
if((x & 0x1) == 0x1) ones++;
x = (x >> 1);
}
printf("%x contains %d ones and %d zeros.\n", x, ones, 32-ones);
Notice however, that it modifies the value in the process. If you are doing this on data you need to keep, then you need to make a copy of it first.
Doing this in __asm would probably be a better, maybe faster way, but it's hard to say with how well the compiler can optimize...
With each solution you consider, each one will have drawbacks. A lookup table or a bit shifter (like mine), both have drawbacks.
Larry
ttobiass - Keep in mind your inline functions are important in applications like you are talking about, but there are things you need to keep in mind. You CAN get the performance out of the inline code, just remember a couple things.
inline in debug mode does not exist. (Unless you force it)
the compiler will inline functions as it sees fit. Often, if you tell it to inline a function, it may not do it at all. Even if you use __forceinline. Check MSDN for more info on inlining.
Only certain functions can even be inlined. For example, you cannot inline a recursive function.
You'll get your best performance out of your project settings for the C/C++ language, and how you construct your code. At this point, it's important to understand Heap vs. Stack operations, calling conventions, memory alignment, etc.
I know this does not answer your question exactly, but you mention performance, and how to get the best performance, and these things are key.
To join the link wagon:
counting bits
If this is not a case of premature optimization and you truly need to squeeze out every last femtosecond, then you're probably better off with a 256-element static array that you populate once with the bit-count of each byte value, then
Stats.FreqOf1 += bitCountTable[byte]
and when the loop is done:
Stats.FreqOf0 = ((data->Count * 8) - Stats.FreqOf1)
There's a whole chapter on the different techniques for this in the book Beautiful Code. You can read (most of) it on Google books starting here.
A faster way to extract bits is to use:
bitmask= data->Data[i];
while (bitmask)
{
bit_set_as_power_of_two= bitmask & -bitmask;
bitmask&= bitmask - 1;
}
If you just want to count bits set, a LUT in cache per would be fast, but you can also do it in constant time with the interleaved bit counting method in the link in this answer.

Resources