Creating bitflag variables with large amounts of flags or how to create large bit-width numbers - c

Lets say I have an enum with bitflag options larger than the amount of bits in a standard data type:
enum flag_t {
FLAG_1 = 0x1,
FLAG_2 = 0x2,
...
FLAG_130 = 0x400000000000000000000000000000000,
};
This is impossible for several reasons. Enums are max size of 128 bits (in C/gcc on my system from experimentation), single variables are also of max size 128 bits etc.
In C you can't perform bitwise operations on arrays, though in C++ I suppose you could overload bitwise operators to do the job with a loop.
Is there any way in C other than manually remembering which flags go where to have this work for large numbers?

This is exactly what bit-fields are for.
In C, it's possible to define the following data layout :
struct flag_t
{
unsigned int flag1 : 1;
unsigned int flag2 : 1;
unsigned int flag3 : 1;
(...)
unsigned int flag130 : 1;
(...)
unsigned int flag1204 : 1; // for fun
};
In this example, all flags occupy just one bit. An obvious advantage is the unlimited number of flags. Another great advantage is that you are no longer limited to single-bit flags, you could have some multi-value flags merged in the middle.
But most importantly, testing and attribution would be a bit different, and probably simplified, as far as unit operations are concerned : you no longer need to do any masking, just access the flag directly by naming it. And by the way, use the opportunity to give these flags more comprehensive names :)

Instead of trying to assign absurdly large numbers to an enum so you can have a hundreds-of-bits-wide bitfield, let the compiler assign a normal zero-based sequence of numbers to your flag names, and simulate a wide bitfield using an array of unsigned char. You can have a 1024-bit bitfield using unsigned char bits[128], and write get_flag() and set_flag() accessor functions to mask the minor amount of extra work involved.
However, a far better piece of advice would be to look at your design again, and ask yourself "Why do I need over a hundred different flags?". It seems to me that what you really need is a redesign.

In this answer to a question related to bitflags, Bit Manipulation and Flags, I provided an example of using an unsigned char array that is an approach for very large sets of bitflags which I am moving to this posting.
This source example provides the following:
a set of Preprocessor defines for the bitflag values
a set of Preprocessor macros to manipulate bits
a couple of functions to implement bitwise operations on the arrays
The general approach for this is as follows:
create a set of defines for the flags which specify an array offset and a bit pattern
create a typedef for an unsigned char array of the proper size
create a set of functions that implement the bitwise logical operations
The Specifics from the Answer with a Few Improvements and More Exposition
Use a set of C Preprocessor defines to create a set of bitflags to be used with the array. These bitflag defines specify an offset within the unsigned char array along with the bit to manipulate.
The defines in this example are 16 bit values in which the upper byte contains the array offset and the lower byte contains the bit flag(s) for the byte of the unsigned char array whose offset is in the upper byte. Using this technique you can have arrays up to 256 elements, 256 * 8 or 2,048 bitflags, or by going from a 16 bit define to a 32 bit long you could have much more. (In the comments below bit 0 means least significant bit of a byte and bit 7 means most significant bite of a byte).
#define ITEM_FLG_01 0x0001 // array offset 0, bit 0
#define ITEM_FLG_02 0x0002 // array offset 0, bit 1
#define ITEM_FLG_03 0x0101 // array offset 1, bit 0
#define ITEM_FLG_04 0x0102 // array offset 1, bit 1
#define ITEM_FLG_05 0x0201 // array offset 2, bit 0
#define ITEM_FLG_06 0x0202 // array offset 2, bit 1
#define ITEM_FLG_07 0x0301 // array offset 3, bit 0
#define ITEM_FLG_08 0x0302 // array offset 3, bit 1
#define ITEM_FLG_10 0x0908 // array offset 9, bit 7
Next you have a set of macros to set and unset the bits along with a typedef to make it a bit easier to use. Unfortunately using a typedef with C does not provide you better type checking from the compiler but it does make it easier to use. These macros do no checking of their arguments so you might feel safer using regular functions instead.
#define SET_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) |= (b) & 0xf)
#define TOG_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) ^= (b) & 0xf)
#define CLR_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) &= ~ ((b) & 0xf))
#define TST_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) & ((b) & 0xf))
typedef unsigned char BitSet[10];
An example of using this basic framework is as follows.
BitSet uchR = { 0 };
int bValue;
SET_BIT(uchR, ITEM_FLG_01);
bValue = TST_BIT(uchR, ITEM_FLG_01);
SET_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_04);
CLR_BIT(uchR, ITEM_FLG_05);
CLR_BIT(uchR, ITEM_FLG_01);
Next you can introduce a set of utility functions to do some of the bitwise operations we want to support. These bitwise operations would be analogous to the built in C operators such as bitwise Or (|) or bitwise And (&). These functions use the built in C operators to perform the designated operator on all array elements.
These particular examples of the utility functions modify one of the sets of bitflags provided. However if that is a problem, you can modify the functions to accept three arguments, one being for the result of the operation and the other two for the two sets of bitflags to use in the operation.
void AndBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ &= *s2++;
}
}
void OrBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ |= *s2++;
}
}
void XorBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ ^= *s2++;
}
}
If you need more than one size of a bitflags type using this approach then the most flexible approach to eliminate the typedef and just use straight unsigned char arrays of various sizes. This change would entail modifying the interface of the utility functions replacing BitSet with unsigned char pointer and unsigned char arrays where bitflag variables are defined. Along with the unsigned char pointers, you would also need to specify a length for the arrays.
You may also consider an approach similar to what is being done for text strings in Is concatenating arbitrary number of strings with nested function calls in C undefined behavior?.

Related

Is there a difference between a bit mask and a bit array?

I have heard the two terms used interchangeably. Is there a difference?
For example,
unsigned char chessboard : 64; /* Bit mask */
unsigned char chessboard_2 [64]; /* Bit array */
Bit Mask
A bit mask is a binary value that's used to refer to specific bits in an integer value when using bitwise operators. For instance, you might have:
unsigned int low3 = 0x7;
This is a bit mask with the low order 3 bits set. You can then use it to extract a part of a value:
unsigned int value = 030071;
unsigned int value_low3 = value & low3; // result is 01
or to update part of value:
unsigned int newvalue = (value & ~low3) | 5; // result is 030075
Bit Array
A bit array is an unsigned integer, or an array of unsigned integers, that's used to hold a sequence of boolean flags, where each value is in separate bits of the integer(s). If you have lots of boolean values to store, this is saves lots of memory compared to having each of them in a separate array element.
However, there's a tradeoff: in order to access a specific flag, you need to use masking and shifting.
If your bit array is small enough to fit in a single integer, you might declare:
uint32_t bitarray;
Then to access a specific element of it, you use:
bitvalue = (bitarray >> bitnum) & 0x1;
and to set an element:
bitarray |= (1u << bitnum);
and to clear an element:
bitarray &= ~(1u << bitnum);
If the bit array needs multiple words, you declare an array. You get the array index by dividing the bit number by the number of bits in each array element, then use the remainder to determine the bit number within that word and use the above expressions.
None of them is a bitmask. The first is the definition of a bitfield which should only be valid as a struct member and the second is an array of 64 unsigned chars.

Copy 6 byte array to long long integer variable

I have read from memory a 6 byte unsigned char array.
The endianess is Big Endian here.
Now I want to assign the value that is stored in the array to an integer variable. I assume this has to be long long since it must contain up to 6 bytes.
At the moment I am assigning it this way:
unsigned char aFoo[6];
long long nBar;
// read values to aFoo[]...
// aFoo[0]: 0x00
// aFoo[1]: 0x00
// aFoo[2]: 0x00
// aFoo[3]: 0x00
// aFoo[4]: 0x26
// aFoo[5]: 0x8e
nBar = (aFoo[0] << 64) + (aFoo[1] << 32) +(aFoo[2] << 24) + (aFoo[3] << 16) + (aFoo[4] << 8) + (aFoo[5]);
A memcpy approach would be neat, but when I do this
memcpy(&nBar, &aFoo, 6);
the 6 bytes are being copied to the long long from the start and thus have padding zeros at the end.
Is there a better way than my assignment with the shifting?
What you want to accomplish is called de-serialisation or de-marshalling.
For values that wide, using a loop is a good idea, unless you really need the max. speed and your compiler does not vectorise loops:
uint8_t array[6];
...
uint64_t value = 0;
uint8_t *p = array;
for ( int i = (sizeof(array) - 1) * 8 ; i >= 0 ; i -= 8 )
value |= (uint64_t)*p++ << i;
// left-align
value <<= 64 - (sizeof(array) * 8);
Note using stdint.h types and sizeof(uint8_t) cannot differ from1`. Only these are guaranteed to have the expected bit-widths. Also use unsigned integers when shifting values. Right shifting certain values is implementation defined, while left shifting invokes undefined behaviour.
Iff you need a signed value, just
int64_t final_value = (int64_t)value;
after the shifting. This is still implementation defined, but all modern implementations (and likely the older) just copy the value without modifications. A modern compiler likely will optimize this, so there is no penalty.
The declarations can be moved, of course. I just put them before where they are used for completeness.
You might try
nBar = 0;
memcpy((unsigned char*)&nBar + 2, aFoo, 6);
No & needed before an array name caz' it's already an address.
The correct way to do what you need is to use an union:
#include <stdio.h>
typedef union {
struct {
char padding[2];
char aFoo[6];
} chars;
long long nBar;
} Combined;
int main ()
{
Combined x;
// reset the content of "x"
x.nBar = 0; // or memset(&x, 0, sizeof(x));
// put values directly in x.chars.aFoo[]...
x.chars.aFoo[0] = 0x00;
x.chars.aFoo[1] = 0x00;
x.chars.aFoo[2] = 0x00;
x.chars.aFoo[3] = 0x00;
x.chars.aFoo[4] = 0x26;
x.chars.aFoo[5] = 0x8e;
printf("nBar: %llx\n", x.nBar);
return 0;
}
The advantage: the code is more clear, there is no need to juggle with bits, shifts, masks etc.
However, you have to be aware that, for speed optimization and hardware reasons, the compiler might squeeze padding bytes into the struct, leading to aFoo not sharing the desired bytes of nBar. This minor disadvantage can be solved by telling the computer to align the members of the union at byte-boundaries (as opposed to the default which is the alignment at word-boundaries, the word being 32-bit or 64-bit, depending on the hardware architecture).
This used to be achieved using a #pragma directive and its exact syntax depends on the compiler you use.
Since C11/C++11, the alignas() specifier became the standard way to specify the alignment of struct/union members (given your compiler already supports it).

how can split integers into bytes without using arithmetic in c?

I am implementing four basic arithmetic functions(add, sub, division, multiplication) in C.
the basic structure of these functions I imagined is
the program gets two operands by user using scanf,
and the program split these values into bytes and compute!
I've completed addition and subtraction,
but I forgot that I shouldn't use arithmetic functions,
so when splitting integer into single bytes,
I wrote codes like
while(quotient!=0){
bin[i]=quotient%2;
quotient=quotient/2;
i++;
}
but since there is arithmetic functions that i shouldn't use..
so i have to rewrite that splitting parts,
but i really have no idea how can i split integer into single byte without using
% or /.
To access the bytes of a variable type punning can be used.
According to the Standard C (C99 and C11), only unsigned char brings certainty to perform this operation in a safe way.
This could be done in the following way:
typedef unsigned int myint_t;
myint_t x = 1234;
union {
myint_t val;
unsigned char byte[sizeof(myint_t)];
} u;
Now, you can of course access to the bytes of x in this way:
u.val = x;
for (int j = 0; j < sizeof(myint_t); j++)
printf("%d ",u.byte[j]);
However, as WhozCrag has pointed out, there are issues with endianness.
It cannot be assumed that the bytes are in determined order.
So, before doing any computation with bytes, your program needs to check how the endianness works.
#include <limits.h> /* To use UCHAR_MAX */
unsigned long int ByteFactor = 1u + UCHAR_MAX; /* 256 almost everywhere */
u.val = 0;
for (int j = sizeof(myint_t) - 1; j >= 0 ; j--)
u.val = u.val * ByteFactor + j;
Now, when you print the values of u.byte[], you will see the order in that bytes are arranged for the type myint_t.
The less significant byte will have value 0.
I assume 32 bit integers (if not the case then just change the sizes) there are more approaches:
BYTE pointer
#include<stdio.h>
int x; // your integer or whatever else data type
BYTE *p=(BYTE*)&x;
x=0x11223344;
printf("%x\n",p[0]);
printf("%x\n",p[1]);
printf("%x\n",p[2]);
printf("%x\n",p[3]);
just get the address of your data as BYTE pointer
and access the bytes directly via 1D array
union
#include<stdio.h>
union
{
int x; // your integer or whatever else data type
BYTE p[4];
} a;
a.x=0x11223344;
printf("%x\n",a.p[0]);
printf("%x\n",a.p[1]);
printf("%x\n",a.p[2]);
printf("%x\n",a.p[3]);
and access the bytes directly via 1D array
[notes]
if you do not have BYTE defined then change it for unsigned char
with ALU you can use not only %,/ but also >>,& which is way faster but still use arithmetics
now depending on the platform endianness the output can be 11,22,33,44 of 44,33,22,11 so you need to take that in mind (especially for code used in multiple platforms)
you need to handle sign of number, for unsigned integers there is no problem
but for signed the C uses 2'os complement so it is better to separate the sign before spliting like:
int s;
if (x<0) { s=-1; x=-x; } else s=+1;
// now split ...
[edit2] logical/bit operations
x<<n,x>>n - is bit shift left and right of x by n bits
x&y - is bitwise logical and (perform logical AND on each bit separately)
so when you have for example 32 bit unsigned int (called DWORD) yu can split it to BYTES like this:
DWORD x; // input 32 bit unsigned int
BYTE a0,a1,a2,a3; // output BYTES a0 is the least significant a3 is the most significant
x=0x11223344;
a0=DWORD((x )&255); // should be 0x44
a1=DWORD((x>> 8)&255); // should be 0x33
a2=DWORD((x>>16)&255); // should be 0x22
a3=DWORD((x>>24)&255); // should be 0x11
this approach is not affected by endianness
but it uses ALU
the point is shift the bits you want to position of 0..7 bit and mask out the rest
the &255 and DWORD() overtyping is not needed on all compilers but some do weird stuff without them especially on signed variables like char or int
x>>n is the same as x/(pow(2,n))=x/(1<<n)
x&((1<<n)-1) is the same as x%(pow(2,n))=x%(1<<n)
so (x>>8)=x/256 and (x&255)=x%256

How to initialize the bits in a register using C in a readable manner

I have a 24 bit register that comprises a number of fields. For example, the 3 upper bits are "mode", the bottom 10 bits are "data rate divisor", etc. Now, I can just work out what has to go into this 24 bits and code it as a single hex number 0xNNNNNN. However, that is fairly unreadable to anyone trying to maintain it.
The question is, if I define each subfield separately what's the best way of coding it all together?
The classic way is to use the << left shift operator on constant values and combine all values with either + or |. For example:
*register_address = (SYNC_MODE << 21) | ... | DEFAULT_RATE;
Solution 1
The "standard" approach for this problem is to use a struct with bitfield members. Something like this:
typedef struct {
int divisor: 10;
unsigned int field1: 9;
char field2: 2;
unsigned char mode: 3;
} fields;
The numbers after each field name specify the number of bits used by that member. In the example above, field divisor uses 10 bits and can store values between -512 and 511 (signed integer) while mode can store unsigned values on 3 bits: between 0 and 7.
The range of values for each field use the usual rules regarding signed/unsigned and but the field length (char/int/long) is limited to the specified number of bits. Of course, a char can still hold up to 8 bits, a short up to 16 a.s.o. The coercion rules are the usual rules for the types of the fields, taking into account their size (i.e. storing -5 in mode will convert it to unsigned (and the actual value will probably be 3).
There are several issues you need to pay attention of (some of them are also mentioned in the Notes section of the documentation page about bit fields:
the total amount of bits declared in the structure must be 24 (the size of your register);
because your structure uses 3 bytes, it's possible that some positions in arrays of such structures to behave strange because they span the allocation unit size (which is usually 4 or 8 bytes, depending on the hardware);
the order of the bit fields in the allocation unit is not guaranteed by the standard; depending on the architecture, it's possible that in the final 3-bytes pack, the field mode contains either the most significant 3 bits or the least significant 3 bites; you can sort this thing out easily, though.
You probably need to handle the values you store in a fields structure all at once. For that you can embed the structure in an union:
typedef union {
fields f;
unsigned int a;
} reg;
reg x;
/* Access individual fields */
x.f.mode = 2;
x.f.divisor = 42;
/* Get the entire register */
printf("%06X\n", x.a);
Solution 2
An alternative way to do (kind of) the same thing is to use macros to extract the fields and to compose the entire register:
#define MAKE_REG(mode, field2, field1, divisor) \
((((mode) & 0x07) << 21) | \
(((field2) & 0x03) << 19) | \
(((field1) & 0x01FF) << 10 )| \
((divisor) & 0x03FF))
#define GET_MODE(reg) (((reg) & 0xE00000) >> 21)
#define GET_FIELD2(reg) (((reg) & 0x180000) >> 19)
#define GET_FIELD1(reg) (((reg) & 0x07FC00) >> 10)
#define GET_DIVISOR(reg) ((reg) & 0x0003FF)
The first macro assembles the mode, field2, field1, divisor values into a 3-bytes integer. The other set of macros extract the values of individual fields. All of them assume the processed numbers are unsigned.
Pros and cons
The struct (embedded in an union) solution:
[+] it allows the compiler to do some checks of the values you want to put into the fields (and issue warnings); also, it does the correct conversions between signed and unsigned;
The macro solution:
[+] it is not sensible to memory alignment issues, you put the bits exactly where you want;
(-) it doesn't check the range of the values you put in fields;
(-) the handling of signed values is a little bit trickier using macros; the macros suggested here work only for unsigned values; more shifting is required in order to use signed values.

C: how to build up a binary integer

I have some logic that I would like to store as an integer. I have 30 "positions" that can be either yes or no and I would like to represent this as an integer. As I am looping through these positions what would be the easiest way to store this information as an integer?
You can use a 32 bit uint:
uint32_t flags = 0;
flags |= UINT32_C(1) << x; // set x'th bit from right
flags &= ~(UINT32_C(1) << x); // unset x'th bit from right
if (flags & UINT32_C(1) << x) // test x'th bit from right
struct{
int flag0:1;
int flag1:1;
...
int flag31:1;
} myFlags;
Using :x in definition of an integer struct member means bitfield with x bits assigned.
You can access each struct member as usual, but the values can only be according to the size in bits (in my example - either 1 or 0 because only 1 bit is available), and the compiler will enforce it. The struct will be (probably, depends on the compiler settings) packed to a total size of integers needed to represent the total bits.
Another option would be using a int and bitwise operators & and | to access specific bits. In this case you have to make sure yourself that setting one bit won't affect another, and that there are no overflows etc.
#define POSITION_A 1
#define POSITION_B 2
unsigned int position = 0;
// set a position
position |= POSITION_A;
// clear a position
position &= = ~(POSITION_A);
Yes, as WTP's comment, you could save all your data in one unsigned int (uint32_t), and access it with AND(&), OR(|), NOT(~).
If saving storage is not a primary concern, however, I recommend not to use this compact technique.
You may need to expand your code to support more than 2 types(yes/no) of answers such as (yes/no/maybe).
You may have more than 30 questions which does not fit into one unsigned int.
If I were you, I'll use some array/list of small int (short or char) to store the values. It's somewhat waste of storage, but much easier to read, and much easier to add more features.

Resources