Writing specific bits to a binary header - c

Suppose I want to create binary of a specific length, using the first 4 bits to define the type (Which should allow for 16 different types) and the last 60 bits to define content.
How would one go ahead and construct this in C? I'm having a hard time finding any examples (That properly explains it) of doing this in C (I haven't worked with C this low-level before and I'm trying to get my feet wet...)
Would I just create a char[8] and manually set each bit with something like
/** Set bit in any sized bit block.
*
* #return none
*
* #param bit - Bit number.
* #param bitmap - Pointer to bitmap.
*
* #note Please note that this function does not know the size of the
* bitmap and it cannot range check the specified bit number.
*/
void SetBit(int bit, unsigned char *bitmap)
{
int n, x;
x = bit / 8; // Index to byte.
n = bit % 8; // Specific bit in byte.
bitmap[x] |= (1 << n); // Set bit.
}
Above code is from storing a bit in a bit of character array in C linux

I would create a function specific to the task and just use a mask.
void setType(uint8_t type, uint8_t* header)
{
header[0] = (header[0] & 0x0f) | (type << 4);
}
// To use:
uint8_t header[8];
setType(3, header);
I would create a similar function to set each field of the header.
The above assumes that by "first four bits" you mean the most significant bits of the first byte of the header rather than the least significant bits of the first byte of the header.

You could have something like this function that will help you set a specific nibble (Nibble is 4bits of data.1 byte(8bit) is 2 Nibbles , probably that was what confused you).This means that you just need to pass your char[x] byte specifying either you need to change left or right hand side part of the byte:
int set_nibble(unsigned char* dest,unsigned char src,nibble_side side)
{
if (side == left_hand )
{
*dest = ((*dest & 0x0f) | (src<<4));
return 0;
}
if (side == right_hand )
{
*dest = ((*dest & 0xf0) | (src));
return 0;
}
return -1;
}
where nibble_side param is something like
typedef enum nibble_side_t
{
right_hand, left_hand
} nibble_side;
Here and here two decent guides for Binary AND operations.You must feel comfortable to use it for filtering the data you need , before you do operations like this.

Related

What are bit vectors and how do I use them to convert chars to ints?

Here's the explanation for our task when implementing a set data structure in C "The set is constructed as a Bit vector, which in turn is implemented as an array of the data type char."
My confusion arises from the fact that almost all the functions we're given take in a set and an int as shown in the function below yet our array is made up of chars. How would I call functions if they can only take in ints when I have an array of chars? Here's my attempt att calling the function in my main function as well as the structs and example of function used.
int main(){
set *setA = set_empty();
set_insert("green",setA );
}
struct set {
int capacity;
int size;
char *array;
};
void set_insert(const int value, set *s)
{
if (!set_member_of(value, s)) {
int bit_in_array = value; // To make the code easier to read
// Increase the capacity if necessary
if (bit_in_array >= s->capacity) {
int no_of_bytes = bit_in_array / 8 + 1;
s->array = realloc(s->array, no_of_bytes);
for (int i = s->capacity / 8 ; i < no_of_bytes ; i++) {
s->array[i] = 0;
}
s->capacity = no_of_bytes * 8;
}
// Set the bit
int byte_no = bit_in_array / 8;
int bit = 7 - bit_in_array % 8;
s->array[byte_no] = s->array[byte_no] | 1 << bit;
s->size++;
}
}
TL;DR: The types of the index (value in your case) and the indexed element of an array (array in your case) are independent from each other. There is no conversion.
Most digital systems these days store their value in bits, each of them can hold only 0 or 1.
An integer value can therefore be viewed as a binary number, a value to the base of 2. It is a sequence of bits, each of which assigned a power of two. See the Wikipedia page on two's complement for details. But this aspect is not relevant for your issue.
Relevant is the view that an integer value is a sequence of bits. The simplest integer type of C is the char. It holds commonly 8 bits. We can assign indexes to these bits, and therefore think of them as a "vector", mathematically. Some people start to count "from the left", others start to count "from the right". Other common terms in this area are "MSB" and "LSB", see this Wikipedia page for more.
To access an element of a vector, you use its index. On a common char this is commonly a value between 0 and 7, inclusively. Remember, in CS we start to count from zero. The type of this index can be any integer wide enough to hold the value, for example an int. This is why you use a int in your case. This data type is independent from the type of elements in the vector.
How to solve the problem, if you need more than 8 bits? Well, then you can use more chars. This is the reason why your structure holds (a pointer to) an array of chars. All n chars of the array represent a vector of n * 8 bits, and you call this amount the "capacity".
Another option is to use a wider type, like a long or even a long long. And you can build an array of elements of these types, too. However, the widths of such types are commonly not equal in all systems.
BTW, the mathematical "vector" is the same thing as an "array" in CS. Different science areas, different terms.
Now, what is a "set"? I hope your script explains that a bit better than I can... It is a collection that contains an element only once or not at all. All elements are distinct. In your case the elements are represented by (small) integers.
Given a vector of bits of arbitrary capacity, we can "map" an element of a set on a bit of this vector by its index. This is done by storing a 1 in the mapped bit, if the element is present in the set, or 0, if it is not.
To access the correct bit, we need the index of the single char in the array, and the index of the bit in this char. You calculate these values in the lines:
int byte_no = bit_in_array / 8;
int bit = 7 - bit_in_array % 8;
All variables are of type int, most probably because this is the common type. It can be any other integer type, like a size_t for example, as long as it can hold the necessary values, even different types for the different variables.
With these two values at hand, you can "insert" the element into the set. For this action, you set the respective bit to 1:
s->array[byte_no] = s->array[byte_no] | 1 << bit;
Please note that the shift operator << has a higher precedence than the bit-wise OR operator |. Some coding style rules request to use parentheses to make this clear, but you can also use this even clearer assignment:
s->array[byte_no] |= 1 << bit;

Bitwise operation in C language (0x80, 0xFF, << )

I have a problem understanding this code. What I know is that we have passed a code into a assembler that has converted code into "byte code". Now I have a Virtual machine that is supposed to read this code. This function is supposed to read the first byte code instruction. I don't understand what is happening in this code. I guess we are trying to read this byte code but don't understand how it is done.
static int32_t bytecode_to_int32(const uint8_t *bytecode, size_t size)
{
int32_t result;
t_bool sign;
int i;
result = 0;
sign = (t_bool)(bytecode[0] & 0x80);
i = 0;
while (size)
{
if (sign)
result += ((bytecode[size - 1] ^ 0xFF) << (i++ * 8));
else
result += bytecode[size - 1] << (i++ * 8);
size--;
}
if (sign)
result = ~(result);
return (result);
}
This code is somewhat badly written, lots of operations on a single line and therefore containing various potential bugs. It looks brittle.
bytecode[0] & 0x80 Simply reads the MSB sign bit, assuming it's 2's complement or similar, then converts it to a boolean.
The loop iterates backwards from most significant byte to least significant.
If the sign was negative, the code will perform an XOR of the data byte with 0xFF. Basically inverting all bits in the data. The result of the XOR is an int.
The data byte (or the result of the above XOR) is then bit shifted i * 8 bits to the left. The data is always implicitly promoted to int, so in case i * 8 happens to give a result larger than INT_MAX, there's a fat undefined behavior bug here. It would be much safer practice to cast to uint32_t before the shift, carry out the shift, then convert to a signed type afterwards.
The resulting int is converted to int32_t - these could be the same type or different types depending on system.
i is incremented by 1, size is decremented by 1.
If sign was negative, the int32_t is inverted to some 2's complement negative number that's sign extended and all the data bits are inverted once more. Except all zeros that got shifted in with the left shift are also replaced by ones. If this is intentional or not, I cannot tell. So for example if you started with something like 0x0081 you now have something like 0xFFFF01FF. How that format makes sense, I have no idea.
My take is that the bytecode[size - 1] ^ 0xFF (which is equivalent to ~) was made to toggle the data bits, so that they would later toggle back to their original values when ~ is called later. A programmer has to document such tricks with comments, if they are anything close to competent.
Anyway, don't use this code. If the intention was merely to swap the byte order (endianess) of a 4 byte integer, then this code must be rewritten from scratch.
That's properly done as:
static int32_t big32_to_little32 (const uint8_t* bytes)
{
uint32_t result = (uint32_t)bytes[0] << 24 |
(uint32_t)bytes[1] << 16 |
(uint32_t)bytes[2] << 8 |
(uint32_t)bytes[3] << 0 ;
return (int32_t)result;
}
Anything more complicated than the above is highly questionable code. We need not worry about signs being a special case, the above code preserves the original signedness format.
So the A^0xFF toggles the bits set in A, so if you have 10101100 xored with 11111111.. it will become 01010011. I am not sure why they didn't use ~ here. The ^ is a xor operator, so you are xoring with 0xFF.
The << is a bitshift "up" or left. In other words, A<<1 is equivalent to multiplying A by 2.
the >> moves down so is equivalent to bitshifting right, or dividing by 2.
The ~ inverts the bits in a byte.
Note it's better to initialise variables at declaration it costs no additional processing whatsoever to do it that way.
sign = (t_bool)(bytecode[0] & 0x80); the sign in the number is stored in the 8th bit (or position 7 counting from 0), which is where the 0x80 is coming from. So it's literally checking if the signed bit is set in the first byte of bytecode, and if so then it stores it in the sign variable.
Essentially if it's unsigned then it's copying the bytes from from bytecode into result one byte at a time.
If the data is signed then it flips the bits then copies the bytes, then when it's done copying, it flips the bits back.
Personally with this kind of thing i prefer to get the data, stick in htons() format (network byte order) and then memcpy it to an allocated array, store it in a endian agnostic way, then when i retrieve the data i use ntohs() to convert it back to the format used by the computer. htons() and ntohs() are standard C functions and are used in networking and platform agnostic data formatting / storage / communication all the time.
This function is a very naive version of the function which converts form the big endian to little endian.
The parameter size is not needed as it works only with the 4 bytes data.
It can be much easier archived by the union punning (and it allows compilers to optimize it - in this case to the simple instruction):
#define SWAP(a,b,t) do{t c = (a); (a) = (b); (b) = c;}while(0)
int32_t my_bytecode_to_int32(const uint8_t *bytecode)
{
union
{
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.b8[3] = *bytecode++;
i32.b8[2] = *bytecode++;
i32.b8[1] = *bytecode++;
i32.b8[0] = *bytecode++;
return i32.i32;
}
int main()
{
union {
int32_t i32;
uint8_t b8[4];
}i32;
uint8_t b;
i32.i32 = -4567;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", bytecode_to_int32(i32.b8, 4));
i32.i32 = -34;
SWAP(i32.b8[0], i32.b8[3], uint8_t);
SWAP(i32.b8[1], i32.b8[2], uint8_t);
printf("%d\n", my_bytecode_to_int32(i32.b8));
}
https://godbolt.org/z/rb6Na5
If the purpose of the code is to sign-extend a 1-, 2-, 3-, or 4-byte sequence in network/big-endian byte order to a signed 32-bit int value, it's doing things the hard way and reimplementing the wheel along the way.
This can be broken down into a three-step process: convert the proper number of bytes to a 32-bit integer value, sign-extend bytes out to 32 bits, then convert that 32-bit value from big-endian to the host's byte order.
The "wheel" being reimplemented in this case is the the POSIX-standard ntohl() function that converts a 32-bit unsigned integer value in big-endian/network byte order to the local host's native byte order.
The first step I'd do is to convert 1, 2, 3, or 4 bytes into a uint32_t:
#include <stdint.h>
#include <limits.h>
#include <arpa/inet.h>
#include <errno.h>
// convert the `size` number of bytes starting at the `bytecode` address
// to a uint32_t value
static uint32_t bytecode_to_uint32( const uint8_t *bytecode, size_t size )
{
uint32_t result = 0;
switch ( size )
{
case 4:
result = bytecode[ 0 ] << 24;
case 3:
result += bytecode[ 1 ] << 16;
case 2:
result += bytecode[ 2 ] << 8;
case 1:
result += bytecode[ 3 ];
break;
default:
// error handling here
break;
}
return( result );
}
Then, sign-extend it (borrowing from this answer):
static uint32_t sign_extend_uint32( uint32_t in, size_t size );
{
if ( size == 4 )
{
return( in );
}
// being pedantic here - the existence of `[u]int32_t` pretty
// much ensures 8 bits/byte
size_t bits = size * CHAR_BIT;
uint32_t m = 1U << ( bits - 1 );
uint32_t result = ( in ^ m ) - m;
return ( result );
}
Put it all together:
static int32_t bytecode_to_int32( const uint8_t *bytecode, size_t size )
{
uint32_t result = bytecode_to_uint32( bytecode, size );
result = sign_extend_uint32( result, size );
// set endianness from network/big-endian to
// whatever this host's endianness is
result = ntohl( result );
// converting uint32_t here to signed int32_t
// can be subject to implementation-defined
// behavior
return( result );
}
Note that the conversion from uint32_t to int32_t implicitly performed by the return statement in the above code can result in implemenation-defined behavior as there can be uint32_t values that can not be mapped to int32_t values. See this answer.
Any decent compiler should optimize that well into inline functions.
I personally think this also needs much better error handling/input validation.

Pointer type of smaller size than variable

Working with embedded systems, in order to have more resolution in a incremental sequence, I have two variables, one always following the other.
Specifically, I set a goal value using a 8 bits variable, but to go from one point (current value) to another I do it using 32 bits steps.
For example (that is a stupid example, but it just to show how I want to use it, in my code there are some temporizations which require the 32 bits varaibles to allow a slow change):
/* The variables */
char goal8bits; // 8 bits
long int current32bits; // 32 bits
char current8bits; // 8 bits
long int step32bits; // 32 bits
/* The main function (in the real code that is done periodically with a specific period) */
current32bits = CONVERT_8BITS_TO_32BITS(current8bits); // E.g: 0xAB -> 0xABABABAB
if (goal8bits < current8bits) {
current32bits += step32bits;
}
current8bits = CONVERT_32BITS_TO_8BITS(current32bits); // E.g: 0x01234567 -> 0x01
/* Other parts of the code */
I use current8bits to know the current value in the middle of a transition.
My question is if I can use a char pointer and make it point to the 32 bits variable one, so I do not need to update it each time I change it.
The previous example will look like this:
/* The variables */
char goal8bits; // 8 bits
long int current32bits; // 32 bits
char *current8bits = (char *)&current32bits; // Pointer to 8 bits
long int step32bits; // 32 bits
/* The main function (in the real code that is done periodically with a specific period) */
if (goal8bits < *current8bits) {
current32bits += step32bits;
}
/* Other parts of the code */
I will use *current8bits to know the current value in the middle of a transition.
Do you see any problem in doing that? Can it lead to a problem wih endianism?
Thank you!
If you know the endianless of your system, and it is static you have to select from
char *current8bits = (char *)&current32bits;
or
char *current8bits = (((char *)&current32bits)+3);
If you have to test it, and your system cannot give you such of info you can derive it at application startup
uint32_t temp = 0x01020304;
uint8_t *temp2 = (uint8_t *)(&temp);
if (*temp2 == 0x01)
{
char *current8bits = (char *)&current32bits;
}
else
{
char *current8bits = (((char *)&current32bits)+3);
}
Another good solution is the top-voted and checked-as-answered answer HERE.
Yes, it is endian dependent code, to make it portable you can use a mask and the left shift operator:
uint8_t goal8bits = 0x01; // 8 bits
uint32_t current32bits = 0x01234567; // 32 bits
uint32_t step32bits = 1; // 32 bits
if (goal8bits < ((current32bits & 0xFF000000) >> 24)) {
current32bits += step32bits;
}

How to define and work with an array of bits in C?

I want to create a very large array on which I write '0's and '1's. I'm trying to simulate a physical process called random sequential adsorption, where units of length 2, dimers, are deposited onto an n-dimensional lattice at a random location, without overlapping each other. The process stops when there is no more room left on the lattice for depositing more dimers (lattice is jammed).
Initially I start with a lattice of zeroes, and the dimers are represented by a pair of '1's. As each dimer is deposited, the site on the left of the dimer is blocked, due to the fact that the dimers cannot overlap. So I simulate this process by depositing a triple of '1's on the lattice. I need to repeat the entire simulation a large number of times and then work out the average coverage %.
I've already done this using an array of chars for 1D and 2D lattices. At the moment I'm trying to make the code as efficient as possible, before working on the 3D problem and more complicated generalisations.
This is basically what the code looks like in 1D, simplified:
int main()
{
/* Define lattice */
array = (char*)malloc(N * sizeof(char));
total_c = 0;
/* Carry out RSA multiple times */
for (i = 0; i < 1000; i++)
rand_seq_ads();
/* Calculate average coverage efficiency at jamming */
printf("coverage efficiency = %lf", total_c/1000);
return 0;
}
void rand_seq_ads()
{
/* Initialise array, initial conditions */
memset(a, 0, N * sizeof(char));
available_sites = N;
count = 0;
/* While the lattice still has enough room... */
while(available_sites != 0)
{
/* Generate random site location */
x = rand();
/* Deposit dimer (if site is available) */
if(array[x] == 0)
{
array[x] = 1;
array[x+1] = 1;
count += 1;
available_sites += -2;
}
/* Mark site left of dimer as unavailable (if its empty) */
if(array[x-1] == 0)
{
array[x-1] = 1;
available_sites += -1;
}
}
/* Calculate coverage %, and add to total */
c = count/N
total_c += c;
}
For the actual project I'm doing, it involves not just dimers but trimers, quadrimers, and all sorts of shapes and sizes (for 2D and 3D).
I was hoping that I would be able to work with individual bits instead of bytes, but I've been reading around and as far as I can tell you can only change 1 byte at a time, so either I need to do some complicated indexing or there is a simpler way to do it?
Thanks for your answers
If I am not too late, this page gives awesome explanation with examples.
An array of int can be used to deal with array of bits. Assuming size of int to be 4 bytes, when we talk about an int, we are dealing with 32 bits. Say we have int A[10], means we are working on 10*4*8 = 320 bits and following figure shows it: (each element of array has 4 big blocks, each of which represent a byte and each of the smaller blocks represent a bit)
So, to set the kth bit in array A:
// NOTE: if using "uint8_t A[]" instead of "int A[]" then divide by 8, not 32
void SetBit( int A[], int k )
{
int i = k/32; //gives the corresponding index in the array A
int pos = k%32; //gives the corresponding bit position in A[i]
unsigned int flag = 1; // flag = 0000.....00001
flag = flag << pos; // flag = 0000...010...000 (shifted k positions)
A[i] = A[i] | flag; // Set the bit at the k-th position in A[i]
}
or in the shortened version
void SetBit( int A[], int k )
{
A[k/32] |= 1 << (k%32); // Set the bit at the k-th position in A[i]
}
similarly to clear kth bit:
void ClearBit( int A[], int k )
{
A[k/32] &= ~(1 << (k%32));
}
and to test if the kth bit:
int TestBit( int A[], int k )
{
return ( (A[k/32] & (1 << (k%32) )) != 0 ) ;
}
As said above, these manipulations can be written as macros too:
// Due order of operation wrap 'k' in parentheses in case it
// is passed as an equation, e.g. i + 1, otherwise the first
// part evaluates to "A[i + (1/32)]" not "A[(i + 1)/32]"
#define SetBit(A,k) ( A[(k)/32] |= (1 << ((k)%32)) )
#define ClearBit(A,k) ( A[(k)/32] &= ~(1 << ((k)%32)) )
#define TestBit(A,k) ( A[(k)/32] & (1 << ((k)%32)) )
typedef unsigned long bfield_t[ size_needed/sizeof(long) ];
// long because that's probably what your cpu is best at
// The size_needed should be evenly divisable by sizeof(long) or
// you could (sizeof(long)-1+size_needed)/sizeof(long) to force it to round up
Now, each long in a bfield_t can hold sizeof(long)*8 bits.
You can calculate the index of a needed big by:
bindex = index / (8 * sizeof(long) );
and your bit number by
b = index % (8 * sizeof(long) );
You can then look up the long you need and then mask out the bit you need from it.
result = my_field[bindex] & (1<<b);
or
result = 1 & (my_field[bindex]>>b); // if you prefer them to be in bit0
The first one may be faster on some cpus or may save you shifting back up of you need
to perform operations between the same bit in multiple bit arrays. It also mirrors
the setting and clearing of a bit in the field more closely than the second implemention.
set:
my_field[bindex] |= 1<<b;
clear:
my_field[bindex] &= ~(1<<b);
You should remember that you can use bitwise operations on the longs that hold the fields
and that's the same as the operations on the individual bits.
You'll probably also want to look into the ffs, fls, ffc, and flc functions if available. ffs should always be avaiable in strings.h. It's there just for this purpose -- a string of bits.
Anyway, it is find first set and essentially:
int ffs(int x) {
int c = 0;
while (!(x&1) ) {
c++;
x>>=1;
}
return c; // except that it handles x = 0 differently
}
This is a common operation for processors to have an instruction for and your compiler will probably generate that instruction rather than calling a function like the one I wrote. x86 has an instruction for this, by the way. Oh, and ffsl and ffsll are the same function except take long and long long, respectively.
You can use & (bitwise and) and << (left shift).
For example, (1 << 3) results in "00001000" in binary. So your code could look like:
char eightBits = 0;
//Set the 5th and 6th bits from the right to 1
eightBits &= (1 << 4);
eightBits &= (1 << 5);
//eightBits now looks like "00110000".
Then just scale it up with an array of chars and figure out the appropriate byte to modify first.
For more efficiency, you could define a list of bitfields in advance and put them in an array:
#define BIT8 0x01
#define BIT7 0x02
#define BIT6 0x04
#define BIT5 0x08
#define BIT4 0x10
#define BIT3 0x20
#define BIT2 0x40
#define BIT1 0x80
char bits[8] = {BIT1, BIT2, BIT3, BIT4, BIT5, BIT6, BIT7, BIT8};
Then you avoid the overhead of the bit shifting and you can index your bits, turning the previous code into:
eightBits &= (bits[3] & bits[4]);
Alternatively, if you can use C++, you could just use an std::vector<bool> which is internally defined as a vector of bits, complete with direct indexing.
bitarray.h:
#include <inttypes.h> // defines uint32_t
//typedef unsigned int bitarray_t; // if you know that int is 32 bits
typedef uint32_t bitarray_t;
#define RESERVE_BITS(n) (((n)+0x1f)>>5)
#define DW_INDEX(x) ((x)>>5)
#define BIT_INDEX(x) ((x)&0x1f)
#define getbit(array,index) (((array)[DW_INDEX(index)]>>BIT_INDEX(index))&1)
#define putbit(array, index, bit) \
((bit)&1 ? ((array)[DW_INDEX(index)] |= 1<<BIT_INDEX(index)) \
: ((array)[DW_INDEX(index)] &= ~(1<<BIT_INDEX(index))) \
, 0 \
)
Use:
bitarray_t arr[RESERVE_BITS(130)] = {0, 0x12345678,0xabcdef0,0xffff0000,0};
int i = getbit(arr,5);
putbit(arr,6,1);
int x=2; // the least significant bit is 0
putbit(arr,6,x); // sets bit 6 to 0 because 2&1 is 0
putbit(arr,6,!!x); // sets bit 6 to 1 because !!2 is 1
EDIT the docs:
"dword" = "double word" = 32-bit value (unsigned, but that's not really important)
RESERVE_BITS: number_of_bits --> number_of_dwords
RESERVE_BITS(n) is the number of 32-bit integers enough to store n bits
DW_INDEX: bit_index_in_array --> dword_index_in_array
DW_INDEX(i) is the index of dword where the i-th bit is stored.
Both bit and dword indexes start from 0.
BIT_INDEX: bit_index_in_array --> bit_index_in_dword
If i is the number of some bit in the array, BIT_INDEX(i) is the number
of that bit in the dword where the bit is stored.
And the dword is known via DW_INDEX().
getbit: bit_array, bit_index_in_array --> bit_value
putbit: bit_array, bit_index_in_array, bit_value --> 0
getbit(array,i) fetches the dword containing the bit i and shifts the dword right, so that the bit i becomes the least significant bit. Then, a bitwise and with 1 clears all other bits.
putbit(array, i, v) first of all checks the least significant bit of v; if it is 0, we have to clear the bit, and if it is 1, we have to set it.
To set the bit, we do a bitwise or of the dword that contains the bit and the value of 1 shifted left by bit_index_in_dword: that bit is set, and other bits do not change.
To clear the bit, we do a bitwise and of the dword that contains the bit and the bitwise complement of 1 shifted left by bit_index_in_dword: that value has all bits set to one except the only zero bit in the position that we want to clear.
The macro ends with , 0 because otherwise it would return the value of dword where the bit i is stored, and that value is not meaningful. One could also use ((void)0).
It's a trade-off:
(1) use 1 byte for each 2 bit value - simple, fast, but uses 4x memory
(2) pack bits into bytes - more complex, some performance overhead, uses minimum memory
If you have enough memory available then go for (1), otherwise consider (2).

Can I allocate a specific number of bits in C?

I am trying to store a large amount of boolean information that is determined at run-time. I was wondering what the best method might be.
I have currently been trying to allocate the memory using:
pStatus = malloc((<number of data points>/8) + 1);
thinking that this will give me enough bits to work with. I could then reference each boolean value using the pointer in array notation:
pStatus[element]
Unfortunately this does not seem to be working very well. First, I am having difficulty initializing the memory to the integer value 0. Can this be done using memset()? Still, I don't think that is impacting why I crash when trying to access pStatus[element].
I am also not entirely convinced that this approach is the best one to be using. What I really want is essentially a giant bitmask that reflects the status of the boolean values. Have I missed something?
pStatus = malloc((<number of data points>/8) + 1);
This does allocate enough bytes for your bits. However,
pStatus[element]
This accesses the element'th byte, not bit. So when element is more than one-eighth of the total number of bits, you're accessing off the end of the array allocated.
I would define a few helper functions
int get_bit(int element)
{
uint byte_index = element/8;
uint bit_index = element % 8;
uint bit_mask = ( 1 << bit_index);
return ((pStatus[byte_index] & bit_mask) != 0);
}
void set_bit (int element)
{
uint byte_index = element/8;
uint bit_index = element % 8;
uint bit_mask = ( 1 << bit_index);
pStatus[byte_index] |= bit_mask);
}
void clear_bit (int element)
{
uint byte_index = element/8;
uint bit_index = element % 8;
uint bit_mask = ( 1 << bit_index);
pStatus[byte_index] &= ~bit_mask;
}
(error checking on range of element left out for clarity. You could make this macros, too)
...thinking that this will give me enough bits to work with. I could then reference each boolean value using the pointer in array notation:
pStatus[element]
element is addressing bytes, not bits. You want something like:
pStatus[element/8] & (1 << (element % 8))
Small point: to get enough memory to store N bits, (N/8) + 1 bytes is imprecise (can be one too many).
(N+7)/8 is always the minimum number, though.
Well, the simplest answer would be to use calloc instead of malloc.
It is defined to initialize the memory it allocates to zero, and can often do it by using page mapping tricks.
That will take care of your memory initialization problem. The other dozen posts here seem to adequately address the indexing problem and the fact that you occasionally allocate an extra byte (oh the horror!), so I won't repeat their content here.
pStatus[element] will give you an entire byte at that address.
To set a particular element you would do something like:
pStatus[element >> 3] |= 1 << (element & 7);
To reset an element:
pStatus[element >> 3] &= ~1 << (element & 7);
and to test an element:
if (pStatus[element >> 3] & (1 << (element & 7)) != 0)
the initial allocation should be
pstatus = malloc((<number of data points> + 7) / 8)
what you had will work but wastes a byte occasionally
I can't help but notice that all replies in C here seem to assume that a byte is 8 bits. This is not necessarily true in C (although it will of course be true on most mainstream hardware), so making this assumption in code is rather bad form.
The proper way to write architecture-neutral code is to
#include <limits.h>
and then use the CHAR_BIT macro wherever you need "the number of bits in a char".
Make yourself happier and define a type and functions to operate on that type. That way if you discover that bit accesses are too slow, you can change the unit of memory per boolean to a byte/word/long or adopt sparse/dynamic data structures if memory is really an issue (ie, if your sets are mostly zeros, you could just keep a list with the coordinates of the 1's.
You can write your code to be completely immune to changes to the implementation of your bit vector.
pStatus[element] does not address the bit. The exact byte it gets is dependent on the type of pStatus -- I assume char* or equivalent -- so pStatus[element] gets you the element'th byte.
You could memset to set to 0, yes.
pStatus = malloc((<number of data points>/8) + 1);
That part's fine.
pStatus[element]
here's where you have trouble. You are address bytes, when you want to address bits.
pStatus[element / 8 ]
will get you the right byte in the array.
You need to allocate c = malloc((N+7)/8) bytes, and you can set the nth with
c[n/8]=((c[n/8] & ~(0x80 >> (n%8))) | (0x80>>(n%8)));
clear with
c[n/8] &= ~(0x80 >> (n%8));
and test with
if(c[n/8] & (0x80 >> (n%8))) blah();
If you don't mind having to write wrappers, you could also use either bit_set or bit_vector from C++'s STL, seems like they (especially the latter) have exactly what you need, already coded, tested and packaged (and plenty of bells and whistles).
It's a real shame we lack a straight forward way to use C++ code in C applications (no, creating a wrapper isn't straight-forward to me, nor fun, and means more work in the long term).
What would be wrong with std::vector<bool>?
It amazes me that only one answer here mentions CHAR_BIT. A byte is often 8 bits, but not always.
You allocation code is correct, see the set_bit() and get_bit() functions given in this answer to access the boolean.
If you are limited to just a few bits you can instead of eaanon01 solution also use the c builtin facility of bitfield (there are very few occasion where you could use them, but this would be one)
For this bit banging stuff I can recommendate:
Herny Warrens "Hacker Delight"
The boolean is "never" a separate value in C. So a struct might be in order to get you going.
It is true that you do not initialize the mem area so you need to do that individually.
Here is a simple example of how you could do it with unions structs and enums
typedef unsigned char BYTE;
typedef unsigned short WORD;
typedef unsigned long int DWORD;
typedef unsigned long long int DDWORD;
enum STATUS
{
status0 = 0x01,
status1 = 0x02,
status2 = 0x04,
status3 = 0x08,
status4 = 0x10,
status5 = 0x20,
status6 = 0x40,
status7 = 0x80,
status_group = status0 + status1 +status4
};
#define GET_STATUS( S ) ( ((status.DDBuf&(DDWORD)S)==(DDWORD)S) ? 1 : 0 )
#define SET_STATUS( S ) ( (status.DDBuf|= (DDWORD)S) )
#define CLR_STATUS( S ) ( (status.DDBuf&= ~(DDWORD)S) )
static union {
BYTE BBuf[8];
WORD WWBuf[4];
DWORD DWBuf[2];
DDWORD DDBuf;
}status;
int main(void)
{
// Reset status bits
status.BBuf[0] = 0;
printf( "%d \n", GET_STATUS( status0 ) );
SET_STATUS( status0 );
printf( "%d \n", GET_STATUS( status0 ) );
CLR_STATUS(status0);
printf( "%d \n", GET_STATUS( status0 ) );
SET_STATUS( status_group );
printf( "%d \n", GET_STATUS( status0 ) );
system( "pause" );
return 0;
}
Hope this helps. This example can handle up until 64 status booleans and could be easy extended.
This exapmle is based on Char = 8 bits int = 16 bits long int = 32 bits and long long int = 64 bits
I have now also added support for status groups.

Resources