Bit Shifting, Masking or a Bit Field Struct? - c

I'm new to working with bits. I'm trying to work with an existing protocol, which can send three different types of messages.
Type 1 is a 16-bit structure:
struct digital
{
unsigned int type:2;
unsigned int highlow:1;
unsigned int sig1:5;
unsigned int :1;
unsigned int sig2:7;
};
The first two bits (type, in my struct above) are always 1 0 . The third bit, highlow, determines whether the signal is on or off, and sig1 + sig2 together define the 12-bit index of the signal. This index is split across the two bytes by a 0, which is always in bit 7.
Type 2 is a 32-bit structure. It has a 2-bit type, a 10-bit index and a 16-bit value, interspersed with 0's at positions 27, 23, 15 & 7. A bit-field struct representation would like something like this:
struct analog
{
unsigned int type:2;
unsigned int val1:2;
unsigned int :1;
unsigned int sig1:3;
unsigned int :1;
unsigned int sig2:7;
unsigned int :1;
unsigned int val2:7;
unsigned int :1;
unsigned int val3:7;
};
sig1 & sig2 together form the 10-bit index. val1 + val2 + val3 together form the 16-bit value of the signal at the 10-bit index.
If I understand how to work with the first two structs, I think I can figure out the third.
My question is, is there a way to assign a single value and have the program work out the bits that need to go into val1, val2 and val3?
I've read about bit shifting, bit-field structs and padding with 0's. The struct seems like the way to go, but I'm not sure how to implement it. None of the examples of bit-packing that I've seen have values that are split the way these are. Ultimately, I'd like to be able to create an analog struct, assign an index (i = 252) and a value (v = 32768) and be done with it.
If someone could suggest the appropriate method or provide a link to a similar sample, I'd greatly appreciate it. If it matters, this code will be incorporated into a larger Objective-C app.
Thanks.
Brad

You can do it with a series of shifts, ands, and ors. I have done the 10-bit index part for Type 2:
unsigned int i = 252;
analog a = (analog)(((i << 16) & 0x7f0000) | (i << 17) & 0x7000000);
Essentially, what this code does is shift the 10 bits of interest in int i to the range 16 - 25, then it ands it with the bitmask 0x7f0000 to set bits 22 - 31 to zero. It also shifts another copy of the 10 bits to the range 17 - 26, then it ands it with the bitmask 0x7000000 to set bits 0 - 22 and 26 - 31 to zero. Then it ors the two values together to create your desired zero-separated value.
.. I'm not absolutely sure that I counted the bitmasks correctly, but I hope you've got the idea. Just shift, and-mask, and or-merge.
Edit: Method 2:
analog a;
a.sig1 = (i & 0x7f); // mask out bit 8 onwards
a.sig2 = ((i<<1) & 0x700); // shift left by one, then mask out bits 0-8
On second thought method 2 is more readable, so you should probably use this.

You don't have to do this, this is where the union keyword comes in - you can specify all the bits out at the same time, or by referring to the same bits with a different name, set them all at once.

You shouldn't use C structure bitfields because the physical layout of bitfields is undefined. While you could figure out what your compiler is doing and get your layout to match the underlying data, the code may not work if you switch to a different compiler or even update your compiler.
I know it's a pain, but do the bit manipulation yourself.

Related

Calculate bitmask from a given index in a 16 bit architecture

I have a function that accepts an index variable of type unsigned long (this type cannot be changed).
void func(unsigned long index);
I need to convert it to a bitmask such that for index 0 the bitmask will be 1, for index 1 bitmask will be 2, for 2 it will be 4 and so on.
I have done the following:
mask = 1 << index;
The problem is that I'm working with an architecture of 16 bit , therefore unsigned long variables are shown as 32 bit which messes up this variable.
(the lowest 16 bits give me the correct value for mask but the highest 16 bits add extra information which messes this up).
i.e. Instead of getting: mask = 0000000000000001 (16 bit)
I'm getting: xxxxxxxxxxxxxxxx0000000000000001 (32 bits)
Is there another way to calculate this bitmask?
Would appreciate help.
Thank you.
You have the correct approach. However, the problem with your implementation is that the type of 1 in 1 << index expression is int, with implementation-defined representation. Since you are looking for an unsigned long result, use ((unsigned long)1) instead:
unsigned long mask = ((unsigned long)1) << index;
If your platform supports stdint.h and you need a mask of some specific width, use uint32_t instead:
uint32_t mask = UINT32_C(1) << index;
Your basic code is correct, although I notice you didn't specify the type of mask.
If the caller passes a value greater than 15 into index, what are you going to do? It sounds like you have to make the most of a bad situation. Depending on the context you could simply return from func, you could assert, or you could proceed with a mask of zero.
This brings us back to the question of the type of mask. I would define it as unsigned short, uint16 or similar, depending on your environment. But other than that, your first attempt was basically correct. It's just a question of error handling.
uint16 shift = index & 15;
uint16 mask = 1 << shift;

Structure for an array of bits in C

It has come to my attention that there is no builtin structure for a single bit in C. There is (unsigned) char and int, which are 8 bits (one byte), and long which is 64+ bits, and so on (uint64_t, bool...)
I came across this while coding up a huffman tree, and the encodings for certain characters were not necessarily exactly 8 bits long (like 00101), so there was no efficient way to store the encodings. I had to find makeshift solutions such as strings or boolean arrays, but this takes far more memory.
But anyways, my question is more general: is there a good way to store an array of bits, or some sort of user-defined struct? I scoured the web for one but the smallest structure seems to be 8 bits (one byte). I tried things such as int a : 1 but it didn't work. I read about bit fields but they do not simply achieve exactly what I want to do. I know questions have already been asked about this in C++ and if there is a struct for a single bit, but mostly I want to know specifically what would be the most memory-efficient way to store an encoding such as 00101 in C.
If you're mainly interested in accessing a single bit at a time, you can take an array of unsigned char and treat it as a bit array. For example:
unsigned char array[125];
Assuming 8 bits per byte, this can be treated as an array of 1000 bits. The first 16 logically look like this:
---------------------------------------------------------------------------------
byte | 0 | 1 |
---------------------------------------------------------------------------------
bit | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 |
---------------------------------------------------------------------------------
Let's say you want to work with bit b. You can then do the following:
Read bit b:
value = (array[b/8] & (1 << (b%8)) != 0;
Set bit b:
array[b/8] |= (1 << (b%8));
Clear bit b:
array[b/8] &= ~(1 << (b%8));
Dividing the bit number by 8 gets you the relevant byte. Similarly, mod'ing the bit number by 8 gives you the relevant bit inside of that byte. You then left shift the value 1 by the bit number to give you the necessary bit mask.
While there is integer division and modulus at work here, the dividend is a power of 2 so any decent compiler should replace them with bit shifting/masking.
It has come to my attention that there is no builtin structure for a single bit in C.
That is true, and it makes sense because substantially no machines have bit-addressible memory.
But anyways, my question is more general: is there a good way to store
an array of bits, or some sort of user-defined struct?
One generally uses an unsigned char or another unsigned integer type, or an array of such. Along with that you need some masking and shifting to set or read the values of individual bits.
I scoured the
web for one but the smallest structure seems to be 8 bits (one byte).
Technically, the smallest addressible storage unit ([[un]signed] char) could be larger than 8 bits, though you're unlikely ever to see that.
I tried things such as int a : 1 but it didn't work. I read about bit
fields but they do not simply achieve exactly what I want to do.
Bit fields can appear only as structure members. A structure object containing such a bitfield will still have a size that is a multiple of the size of a char, so that doesn't map very well onto a bit array or any part of one.
I
know questions have already been asked about this in C++ and if there
is a struct for a single bit, but mostly I want to know specifically
what would be the most memory-efficient way to store an encoding such
as 00101 in C.
If you need a bit pattern and a separate bit count -- such as if some of the bits available in the bit-storage object are not actually significant -- then you need a separate datum for the significant-bit count. If you want a data structure for a small but variable number of bits, then you might go with something along these lines:
struct bit_array_small {
unsigned char bits;
unsigned char num_bits;
};
Of course, you can make that larger by choosing a different data type for the bits member and, maybe, the num_bits member. I'm sure you can see how you might extend the concept to handling arbitrary-length bit arrays if you should happen to need that.
If you really want the most memory efficiency, you can encode the Huffman tree itself as a stream of bits. See, for example:
https://www.siggraph.org/education/materials/HyperGraph/video/mpeg/mpegfaq/huffman_tutorial.html
Then just encode those bits as an array of bytes, with a possible waste of 7 bits.
But that would be a horrible idea. For the structure in memory to be useful, it must be easy to access. You can still do that very efficiently. Let's say you want to encode up to 12-bit codes. Use a 16-bit integer and bitfields:
struct huffcode {
uint16_t length: 4,
value: 12;
}
C will store this as a single 16-bit value, and allow you to access the length and value fields separately. The complete Huffman node would also contain the input code value, and tree pointers (which, if you want further compactness, can be integer indices into an array).
You can make you own bit array in no time.
#define ba_set(ptr, bit) { (ptr)[(bit) >> 3] |= (char)(1 << ((bit) & 7)); }
#define ba_clear(ptr, bit) { (ptr)[(bit) >> 3] &= (char)(~(1 << ((bit) & 7))); }
#define ba_get(ptr, bit) ( ((ptr)[(bit) >> 3] & (char)(1 << ((bit) & 7)) ? 1 : 0 )
#define ba_setbit(ptr, bit, value) { if (value) { ba_set((ptr), (bit)) } else { ba_clear((ptr), (bit)); } }
#define BITARRAY_BITS (120)
int main()
{
char mybits[(BITARRAY_BITS + 7) / 8];
memset(mybits, 0, sizeof(mybits));
ba_setbit(mybits, 33, 1);
if (!ba_get(33))
return 1;
return 0;
};

How to initialize the bits in a register using C in a readable manner

I have a 24 bit register that comprises a number of fields. For example, the 3 upper bits are "mode", the bottom 10 bits are "data rate divisor", etc. Now, I can just work out what has to go into this 24 bits and code it as a single hex number 0xNNNNNN. However, that is fairly unreadable to anyone trying to maintain it.
The question is, if I define each subfield separately what's the best way of coding it all together?
The classic way is to use the << left shift operator on constant values and combine all values with either + or |. For example:
*register_address = (SYNC_MODE << 21) | ... | DEFAULT_RATE;
Solution 1
The "standard" approach for this problem is to use a struct with bitfield members. Something like this:
typedef struct {
int divisor: 10;
unsigned int field1: 9;
char field2: 2;
unsigned char mode: 3;
} fields;
The numbers after each field name specify the number of bits used by that member. In the example above, field divisor uses 10 bits and can store values between -512 and 511 (signed integer) while mode can store unsigned values on 3 bits: between 0 and 7.
The range of values for each field use the usual rules regarding signed/unsigned and but the field length (char/int/long) is limited to the specified number of bits. Of course, a char can still hold up to 8 bits, a short up to 16 a.s.o. The coercion rules are the usual rules for the types of the fields, taking into account their size (i.e. storing -5 in mode will convert it to unsigned (and the actual value will probably be 3).
There are several issues you need to pay attention of (some of them are also mentioned in the Notes section of the documentation page about bit fields:
the total amount of bits declared in the structure must be 24 (the size of your register);
because your structure uses 3 bytes, it's possible that some positions in arrays of such structures to behave strange because they span the allocation unit size (which is usually 4 or 8 bytes, depending on the hardware);
the order of the bit fields in the allocation unit is not guaranteed by the standard; depending on the architecture, it's possible that in the final 3-bytes pack, the field mode contains either the most significant 3 bits or the least significant 3 bites; you can sort this thing out easily, though.
You probably need to handle the values you store in a fields structure all at once. For that you can embed the structure in an union:
typedef union {
fields f;
unsigned int a;
} reg;
reg x;
/* Access individual fields */
x.f.mode = 2;
x.f.divisor = 42;
/* Get the entire register */
printf("%06X\n", x.a);
Solution 2
An alternative way to do (kind of) the same thing is to use macros to extract the fields and to compose the entire register:
#define MAKE_REG(mode, field2, field1, divisor) \
((((mode) & 0x07) << 21) | \
(((field2) & 0x03) << 19) | \
(((field1) & 0x01FF) << 10 )| \
((divisor) & 0x03FF))
#define GET_MODE(reg) (((reg) & 0xE00000) >> 21)
#define GET_FIELD2(reg) (((reg) & 0x180000) >> 19)
#define GET_FIELD1(reg) (((reg) & 0x07FC00) >> 10)
#define GET_DIVISOR(reg) ((reg) & 0x0003FF)
The first macro assembles the mode, field2, field1, divisor values into a 3-bytes integer. The other set of macros extract the values of individual fields. All of them assume the processed numbers are unsigned.
Pros and cons
The struct (embedded in an union) solution:
[+] it allows the compiler to do some checks of the values you want to put into the fields (and issue warnings); also, it does the correct conversions between signed and unsigned;
The macro solution:
[+] it is not sensible to memory alignment issues, you put the bits exactly where you want;
(-) it doesn't check the range of the values you put in fields;
(-) the handling of signed values is a little bit trickier using macros; the macros suggested here work only for unsigned values; more shifting is required in order to use signed values.

C: how to build up a binary integer

I have some logic that I would like to store as an integer. I have 30 "positions" that can be either yes or no and I would like to represent this as an integer. As I am looping through these positions what would be the easiest way to store this information as an integer?
You can use a 32 bit uint:
uint32_t flags = 0;
flags |= UINT32_C(1) << x; // set x'th bit from right
flags &= ~(UINT32_C(1) << x); // unset x'th bit from right
if (flags & UINT32_C(1) << x) // test x'th bit from right
struct{
int flag0:1;
int flag1:1;
...
int flag31:1;
} myFlags;
Using :x in definition of an integer struct member means bitfield with x bits assigned.
You can access each struct member as usual, but the values can only be according to the size in bits (in my example - either 1 or 0 because only 1 bit is available), and the compiler will enforce it. The struct will be (probably, depends on the compiler settings) packed to a total size of integers needed to represent the total bits.
Another option would be using a int and bitwise operators & and | to access specific bits. In this case you have to make sure yourself that setting one bit won't affect another, and that there are no overflows etc.
#define POSITION_A 1
#define POSITION_B 2
unsigned int position = 0;
// set a position
position |= POSITION_A;
// clear a position
position &= = ~(POSITION_A);
Yes, as WTP's comment, you could save all your data in one unsigned int (uint32_t), and access it with AND(&), OR(|), NOT(~).
If saving storage is not a primary concern, however, I recommend not to use this compact technique.
You may need to expand your code to support more than 2 types(yes/no) of answers such as (yes/no/maybe).
You may have more than 30 questions which does not fit into one unsigned int.
If I were you, I'll use some array/list of small int (short or char) to store the values. It's somewhat waste of storage, but much easier to read, and much easier to add more features.

Storing a 4-bit value in the middle of an 8-bit register

I need to count from 0 to 10 and store those values in binary format in ADCON0(5:2). How do I point at bit 5 of this register? Bit 5 is named ADCON0bits.CHS3. If I store a 4 bit variable to ADCON0bits.CHS3, will bits 1 - 3 be written to bits 4 - 2 of the register?
Also, are there any 4 bit data types that I could use?
This is all on a PIC microcontroller.
Edit: I need to store 4 bits in the register like so:
unsigned char count = 10 //max value
[X][X][1][0][1][0][X][X]
This is in line with what was assumed below, but I figured I would clear up my question a bit.
When you say you are writing bits 1-3 of your count into positions 4-2 of your register, do you explicitly mean you are reversing the order of the bits? In this answer I will presume that that was not what you meant.
You can express a bit field explicitly as a struct.
Presuming that you are dealing with a 16 bit register, your struct could look something like this:
struct adcon {
unsigned char someflag : 2;
unsigned char count : 4;
unsigned char other_bits : 2;
};
With each struct member, you specify the number of bits. Then you can operate on the appropriate bits in the register by casting the register to the struct type, and operating on the members of the struct.
(adcon) ADCON0.count = count;
Edit: fixed up the code based on feedback, thanks.
Writing to a bit variable stores the truth value of that variable to the bit. For example, writing:
ADCON0bits.CHS3 = 3;
will set that bit to 1.
If bit5 refers to the bit masked by 0x20 (00100000) and you need to store the 4 bit number in bits masked 0x3c (00111100) then you can use bit shifts and bitwise operations:
// First clear bits 1-5:
ADCON0 &= ~0x3c;
// Now set the bits to correct value:
ADCON0 |= (count << 2); // <-- remember to shift 2 bits to the left
update: As mentioned by Ian in the comments. This sets ADCON0 to an intermediate value before updating. In this case it is OK since it is only selecting the A/D channel and not actually executing the conversion. But in general it's better to do:
unsigned char temp_adcon;
temp_adcon = ADCON0 & ~0x3c;
ADCON0 = temp_adcon | (count << 2);
See the answers for this SO question.
Note that you are doing a read-modify-write operation. You have to be careful of race conditions when doing this. Race conditions may be caused by:
The hardware itself changing bits in the register (e.g. A/D converter operation completes and sets flags). The design of the hardware should provide a means for you to avoid this problem—there are several possible solutions—read the manual for the micro/peripheral to find out.
Your own interrupt routine(s) also writing to the register. If so, when your main (non-interrupt) code writes to the register, it should be done within an "interrupts disabled" context.
I'm not sure about the exact register ADCON0, but often you can read the register, mask the 4 bits and insert your count and then use that value to write back to the register.
Just in case, masking is performed with an AND operation and inserting is an OR operation with the count shift over 2 bits in your case.

Resources