There are several times in C in which a type is guaranteed to be at LEAST a certain size, but not necessarily exactly that size (sizeof(int) can result in 2 or 4). However, I need to be absolutely certain of some sizes and memory locations. If I have a union such as below:
typedef union{
struct{
unsigned int a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;
};
unsigned int val:8;
} foo;
Is it absolutely guaranteed that the value of val is 8 bits long? Moreover, is it guaranteed that a is the most significant bit of val, and b is the second-most significant bit? I wish to do something like this:
foo performLogicalNOT(foo x){
foo product;
product.val = ~x.val;
return product;
}
And thus with an input of specific flags, return a union with exactly the opposite flags (11001100 -> 00110011). The actual functions are more complex, and require that the size of val be exactly 8. I also want to perform AND and OR in the same manner, so it is crucial that each a and b value be where I expect them to be and the size I expect them to be.
How the bits would be packed are not standard and pretty much implementation defined. Have a look at this answer.
Instead of relying on Union, it is better to use bitmask to derive the values. For the above example, char foo can be used. All operations (like ~) would be done on foo only. To get or set the bit specific values, appropriate bitmask can be used.
#define BITMASK_A 0x80
#define BITMASK_B 0x40
and so on..
To get the value of 'a' bit, use:
foo & BITMASK_A
To set the bit to 1, use:
foo | BITMASK_A
To reset the bit to 0, use:
foo & (~BITMASK_A)
Related
I recently saw this post about endianness macros in C and I can't really wrap my head around the first answer.
Code supporting arbitrary byte orders, ready to be put into a file
called order32.h:
#ifndef ORDER32_H
#define ORDER32_H
#include <limits.h>
#include <stdint.h>
#if CHAR_BIT != 8
#error "unsupported char size"
#endif
enum
{
O32_LITTLE_ENDIAN = 0x03020100ul,
O32_BIG_ENDIAN = 0x00010203ul,
O32_PDP_ENDIAN = 0x01000302ul
};
static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
{ { 0, 1, 2, 3 } };
#define O32_HOST_ORDER (o32_host_order.value)
#endif
You would check for little endian systems via
O32_HOST_ORDER == O32_LITTLE_ENDIAN
I do understand endianness in general. This is how I understand the code:
Create example of little, middle and big endianness.
Compare test case to examples of little, middle and big endianness and decide what type the host machine is of.
What I don't understand are the following aspects:
Why is an union needed to store the test-case? Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed? And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?
Why the check for CHAR_BIT? One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide? Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?
Why is a union needed to store the test case?
The entire point of the test is to alias the array with the magic value the array will create.
Isn't uint32_t guaranteed to be able to hold 32 bits/4 bytes as needed?
Well, more-or-less. It will but other than 32 bits there are no guarantees. It would fail only on some really fringe architecture you will never encounter.
And what does the assignment { { 0, 1, 2, 3 } } mean? It assigns the value to the union, but why the strange markup with two braces?
The inner brace is for the array.
Why the check for CHAR_BIT?
Because that's the actual guarantee. If that doesn't blow up, everything will work.
One comment mentions that it would be more useful to check UINT8_MAX? Why is char even used here, when it's not guaranteed to be 8 bits wide?
Because in fact it always is, these days.
Why not just use uint8_t? I found this link to Google-Devs github. They don't rely on this check... Could someone please elaborate?
Lots of other choices would work also.
The initialization has two set of braces because the inner braces initialize the bytes array. So byte[0] is 0, byte[1] is 1, etc.
The union allows a uint32_t to lie on the same bytes as the char array and be interpreted in whatever the machine's endianness is. So if the machine is little endian, 0 is in the low order byte and 3 is in the high order byte of value. Conversely, if the machine is big endian, 0 is in the high order byte and 3 is in the low order byte of value.
{{0, 1, 2, 3}} is the initializer for the union, which will result in bytes component being filled with [0, 1, 2, 3].
Now, since the bytes array and the uint32_t occupy the same space, you can read the same value as a native 32-bit integer. The value of that integer shows you how the array was shuffled - which really means which endian system are you using.
There are only 3 popular possibilities here - O32_LITTLE_ENDIAN, O32_BIG_ENDIAN, and O32_PDP_ENDIAN.
As for the char / uint8_t - I don't know. I think it makes more sense to just use uint_8 with no checks.
I have a 24 bit register that comprises a number of fields. For example, the 3 upper bits are "mode", the bottom 10 bits are "data rate divisor", etc. Now, I can just work out what has to go into this 24 bits and code it as a single hex number 0xNNNNNN. However, that is fairly unreadable to anyone trying to maintain it.
The question is, if I define each subfield separately what's the best way of coding it all together?
The classic way is to use the << left shift operator on constant values and combine all values with either + or |. For example:
*register_address = (SYNC_MODE << 21) | ... | DEFAULT_RATE;
Solution 1
The "standard" approach for this problem is to use a struct with bitfield members. Something like this:
typedef struct {
int divisor: 10;
unsigned int field1: 9;
char field2: 2;
unsigned char mode: 3;
} fields;
The numbers after each field name specify the number of bits used by that member. In the example above, field divisor uses 10 bits and can store values between -512 and 511 (signed integer) while mode can store unsigned values on 3 bits: between 0 and 7.
The range of values for each field use the usual rules regarding signed/unsigned and but the field length (char/int/long) is limited to the specified number of bits. Of course, a char can still hold up to 8 bits, a short up to 16 a.s.o. The coercion rules are the usual rules for the types of the fields, taking into account their size (i.e. storing -5 in mode will convert it to unsigned (and the actual value will probably be 3).
There are several issues you need to pay attention of (some of them are also mentioned in the Notes section of the documentation page about bit fields:
the total amount of bits declared in the structure must be 24 (the size of your register);
because your structure uses 3 bytes, it's possible that some positions in arrays of such structures to behave strange because they span the allocation unit size (which is usually 4 or 8 bytes, depending on the hardware);
the order of the bit fields in the allocation unit is not guaranteed by the standard; depending on the architecture, it's possible that in the final 3-bytes pack, the field mode contains either the most significant 3 bits or the least significant 3 bites; you can sort this thing out easily, though.
You probably need to handle the values you store in a fields structure all at once. For that you can embed the structure in an union:
typedef union {
fields f;
unsigned int a;
} reg;
reg x;
/* Access individual fields */
x.f.mode = 2;
x.f.divisor = 42;
/* Get the entire register */
printf("%06X\n", x.a);
Solution 2
An alternative way to do (kind of) the same thing is to use macros to extract the fields and to compose the entire register:
#define MAKE_REG(mode, field2, field1, divisor) \
((((mode) & 0x07) << 21) | \
(((field2) & 0x03) << 19) | \
(((field1) & 0x01FF) << 10 )| \
((divisor) & 0x03FF))
#define GET_MODE(reg) (((reg) & 0xE00000) >> 21)
#define GET_FIELD2(reg) (((reg) & 0x180000) >> 19)
#define GET_FIELD1(reg) (((reg) & 0x07FC00) >> 10)
#define GET_DIVISOR(reg) ((reg) & 0x0003FF)
The first macro assembles the mode, field2, field1, divisor values into a 3-bytes integer. The other set of macros extract the values of individual fields. All of them assume the processed numbers are unsigned.
Pros and cons
The struct (embedded in an union) solution:
[+] it allows the compiler to do some checks of the values you want to put into the fields (and issue warnings); also, it does the correct conversions between signed and unsigned;
The macro solution:
[+] it is not sensible to memory alignment issues, you put the bits exactly where you want;
(-) it doesn't check the range of the values you put in fields;
(-) the handling of signed values is a little bit trickier using macros; the macros suggested here work only for unsigned values; more shifting is required in order to use signed values.
Lets say I have an enum with bitflag options larger than the amount of bits in a standard data type:
enum flag_t {
FLAG_1 = 0x1,
FLAG_2 = 0x2,
...
FLAG_130 = 0x400000000000000000000000000000000,
};
This is impossible for several reasons. Enums are max size of 128 bits (in C/gcc on my system from experimentation), single variables are also of max size 128 bits etc.
In C you can't perform bitwise operations on arrays, though in C++ I suppose you could overload bitwise operators to do the job with a loop.
Is there any way in C other than manually remembering which flags go where to have this work for large numbers?
This is exactly what bit-fields are for.
In C, it's possible to define the following data layout :
struct flag_t
{
unsigned int flag1 : 1;
unsigned int flag2 : 1;
unsigned int flag3 : 1;
(...)
unsigned int flag130 : 1;
(...)
unsigned int flag1204 : 1; // for fun
};
In this example, all flags occupy just one bit. An obvious advantage is the unlimited number of flags. Another great advantage is that you are no longer limited to single-bit flags, you could have some multi-value flags merged in the middle.
But most importantly, testing and attribution would be a bit different, and probably simplified, as far as unit operations are concerned : you no longer need to do any masking, just access the flag directly by naming it. And by the way, use the opportunity to give these flags more comprehensive names :)
Instead of trying to assign absurdly large numbers to an enum so you can have a hundreds-of-bits-wide bitfield, let the compiler assign a normal zero-based sequence of numbers to your flag names, and simulate a wide bitfield using an array of unsigned char. You can have a 1024-bit bitfield using unsigned char bits[128], and write get_flag() and set_flag() accessor functions to mask the minor amount of extra work involved.
However, a far better piece of advice would be to look at your design again, and ask yourself "Why do I need over a hundred different flags?". It seems to me that what you really need is a redesign.
In this answer to a question related to bitflags, Bit Manipulation and Flags, I provided an example of using an unsigned char array that is an approach for very large sets of bitflags which I am moving to this posting.
This source example provides the following:
a set of Preprocessor defines for the bitflag values
a set of Preprocessor macros to manipulate bits
a couple of functions to implement bitwise operations on the arrays
The general approach for this is as follows:
create a set of defines for the flags which specify an array offset and a bit pattern
create a typedef for an unsigned char array of the proper size
create a set of functions that implement the bitwise logical operations
The Specifics from the Answer with a Few Improvements and More Exposition
Use a set of C Preprocessor defines to create a set of bitflags to be used with the array. These bitflag defines specify an offset within the unsigned char array along with the bit to manipulate.
The defines in this example are 16 bit values in which the upper byte contains the array offset and the lower byte contains the bit flag(s) for the byte of the unsigned char array whose offset is in the upper byte. Using this technique you can have arrays up to 256 elements, 256 * 8 or 2,048 bitflags, or by going from a 16 bit define to a 32 bit long you could have much more. (In the comments below bit 0 means least significant bit of a byte and bit 7 means most significant bite of a byte).
#define ITEM_FLG_01 0x0001 // array offset 0, bit 0
#define ITEM_FLG_02 0x0002 // array offset 0, bit 1
#define ITEM_FLG_03 0x0101 // array offset 1, bit 0
#define ITEM_FLG_04 0x0102 // array offset 1, bit 1
#define ITEM_FLG_05 0x0201 // array offset 2, bit 0
#define ITEM_FLG_06 0x0202 // array offset 2, bit 1
#define ITEM_FLG_07 0x0301 // array offset 3, bit 0
#define ITEM_FLG_08 0x0302 // array offset 3, bit 1
#define ITEM_FLG_10 0x0908 // array offset 9, bit 7
Next you have a set of macros to set and unset the bits along with a typedef to make it a bit easier to use. Unfortunately using a typedef with C does not provide you better type checking from the compiler but it does make it easier to use. These macros do no checking of their arguments so you might feel safer using regular functions instead.
#define SET_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) |= (b) & 0xf)
#define TOG_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) ^= (b) & 0xf)
#define CLR_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) &= ~ ((b) & 0xf))
#define TST_BIT(p,b) (*((p) + (((b) >> 8) & 0xf)) & ((b) & 0xf))
typedef unsigned char BitSet[10];
An example of using this basic framework is as follows.
BitSet uchR = { 0 };
int bValue;
SET_BIT(uchR, ITEM_FLG_01);
bValue = TST_BIT(uchR, ITEM_FLG_01);
SET_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_03);
TOG_BIT(uchR, ITEM_FLG_04);
CLR_BIT(uchR, ITEM_FLG_05);
CLR_BIT(uchR, ITEM_FLG_01);
Next you can introduce a set of utility functions to do some of the bitwise operations we want to support. These bitwise operations would be analogous to the built in C operators such as bitwise Or (|) or bitwise And (&). These functions use the built in C operators to perform the designated operator on all array elements.
These particular examples of the utility functions modify one of the sets of bitflags provided. However if that is a problem, you can modify the functions to accept three arguments, one being for the result of the operation and the other two for the two sets of bitflags to use in the operation.
void AndBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ &= *s2++;
}
}
void OrBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ |= *s2++;
}
}
void XorBits(BitSet s1, const BitSet s2)
{
size_t nLen = sizeof(BitSet);
for (; nLen > 0; nLen--) {
*s1++ ^= *s2++;
}
}
If you need more than one size of a bitflags type using this approach then the most flexible approach to eliminate the typedef and just use straight unsigned char arrays of various sizes. This change would entail modifying the interface of the utility functions replacing BitSet with unsigned char pointer and unsigned char arrays where bitflag variables are defined. Along with the unsigned char pointers, you would also need to specify a length for the arrays.
You may also consider an approach similar to what is being done for text strings in Is concatenating arbitrary number of strings with nested function calls in C undefined behavior?.
I have some logic that I would like to store as an integer. I have 30 "positions" that can be either yes or no and I would like to represent this as an integer. As I am looping through these positions what would be the easiest way to store this information as an integer?
You can use a 32 bit uint:
uint32_t flags = 0;
flags |= UINT32_C(1) << x; // set x'th bit from right
flags &= ~(UINT32_C(1) << x); // unset x'th bit from right
if (flags & UINT32_C(1) << x) // test x'th bit from right
struct{
int flag0:1;
int flag1:1;
...
int flag31:1;
} myFlags;
Using :x in definition of an integer struct member means bitfield with x bits assigned.
You can access each struct member as usual, but the values can only be according to the size in bits (in my example - either 1 or 0 because only 1 bit is available), and the compiler will enforce it. The struct will be (probably, depends on the compiler settings) packed to a total size of integers needed to represent the total bits.
Another option would be using a int and bitwise operators & and | to access specific bits. In this case you have to make sure yourself that setting one bit won't affect another, and that there are no overflows etc.
#define POSITION_A 1
#define POSITION_B 2
unsigned int position = 0;
// set a position
position |= POSITION_A;
// clear a position
position &= = ~(POSITION_A);
Yes, as WTP's comment, you could save all your data in one unsigned int (uint32_t), and access it with AND(&), OR(|), NOT(~).
If saving storage is not a primary concern, however, I recommend not to use this compact technique.
You may need to expand your code to support more than 2 types(yes/no) of answers such as (yes/no/maybe).
You may have more than 30 questions which does not fit into one unsigned int.
If I were you, I'll use some array/list of small int (short or char) to store the values. It's somewhat waste of storage, but much easier to read, and much easier to add more features.
I created a structure to represent a fixed-point positive number. I want the numbers in both sides of the decimal point to consist 2 bytes.
typedef struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
Now I want to add two fixed point numbers, Fixed x and Fixed y. To do so I treat them like integers and add.
(Fixed) ( (int)x + (int)y );
But as my visual studio 2010 compiler says, I cannot convert between Fixed and int.
What's the right way to do this?
EDIT: I'm not committed to the {short floor, short fraction} implementation of Fixed.
You could attempt a nasty hack, but there's a problem here with endian-ness. Whatever you do to convert, how is the compiler supposed to know that you want floor to be the most significant part of the result, and fraction the less significant part? Any solution that relies on re-interpreting memory is going to work for one endian-ness but not another.
You should either:
(1) define the conversion explicitly. Assuming short is 16 bits:
unsigned int val = (x.floor << 16) + x.fraction;
(2) change Fixed so that it has an int member instead of two shorts, and then decompose when required, rather than composing when required.
If you want addition to be fast, then (2) is the thing to do. If you have a 64 bit type, then you can also do multiplication without decomposing: unsigned int result = (((uint64_t)x) * y) >> 16.
The nasty hack, by the way, would be this:
unsigned int val;
assert(sizeof(Fixed) == sizeof(unsigned int)) // could be a static test
assert(2 * sizeof(unsigned short) == sizeof(unsigned int)) // could be a static test
memcpy(&val, &x, sizeof(unsigned int));
That would work on a big-endian system, where Fixed has no padding (and the integer types have no padding bits). On a little-endian system you'd need the members of Fixed to be in the other order, which is why it's nasty. Sometimes casting through memcpy is the right thing to do (in which case it's a "trick" rather than a "nasty hack"). This just isn't one of those times.
If you have to you can use a union but beware of endian issues. You might find the arithmetic doesn't work and certainly is not portable.
typedef struct Fixed_t {
union {
struct { unsigned short floor; unsigned short fraction };
unsigned int whole;
};
} Fixed;
which is more likely (I think) to work big-endian (which Windows/Intel isn't).
Some magic:
typedef union Fixed {
uint16_t w[2];
uint32_t d;
} Fixed;
#define Floor w[((Fixed){1}).d==1]
#define Fraction w[((Fixed){1}).d!=1]
Key points:
I use fixed-size integer types so you're not depending on short being 16-bit and int being 32-bit.
The macros for Floor and Fraction (capitalized to avoid clashing with floor() function) access the two parts in an endian-independent way, as foo.Floor and foo.Fraction.
Edit: At OP's request, an explanation of the macros:
Unions are a way of declaring an object consisting of several different overlapping types. Here we have uint16_t w[2]; overlapping uint32_t d;, making it possible to access the value as 2 16-bit units or 1 32-bit unit.
(Fixed){1} is a compound literal, and could be written more verbosely as (Fixed){{1,0}}. Its first element (uint16_t w[2];) gets initialized with {1,0}. The expression ((Fixed){1}).d then evaluates to the 32-bit integer whose first 16-bit half is 1 and whose second 16-bit half is 0. On a little-endian system, this value is 1, so ((Fixed){1}).d==1 evaluates to 1 (true) and ((Fixed){1}).d!=1 evaluates to 0 (false). On a big-endian system, it'll be the other way around.
Thus, on a little-endian system, Floor is w[1] and Fraction is w[0]. On a big-endian system, Floor is w[0] and Fraction is w[1]. Either way, you end up storing/accessing the correct half of the 32-bit value for the endian-ness of your platform.
In theory, a hypothetical system could use a completely different representation for 16-bit and 32-bit values (for instance interleaving the bits of the two halves), breaking these macros. In practice, that's not going to happen. :-)
This is not possible portably, as the compiler does not guarantee a Fixed will use the same amount of space as an int. The right way is to define a function Fixed add(Fixed a, Fixed b).
Just add the pieces separately. You need to know the value of the fraction that means "1" - here I'm calling that FRAC_MAX:
// c = a + b
void fixed_add( Fixed* a, Fixed* b, Fixed* c){
unsigned short carry = 0;
if((int)(a->floor) + (int)(b->floor) > FRAC_MAX){
carry = 1;
c->fraction = a->floor + b->floor - FRAC_MAX;
}
c->floor = a->floor + b->floor + carry;
}
Alternatively, if you're just setting the fixed point as being at the 2 byte boundary you can do something like:
void fixed_add( Fixed* a, Fixed *b, Fixed *c){
int ia = a->floor << 16 + a->fraction;
int ib = b->floor << 16 + b->fraction;
int ic = ia + ib;
c->floor = ic >> 16;
c->fraction = ic - c->floor;
}
Try this:
typedef union {
struct Fixed_t {
unsigned short floor; //left side of the decimal point
unsigned short fraction; //right side of the decimal point
} Fixed;
int Fixed_int;
}
If your compiler puts the two short on 4 bytes, then you can use memcpy to copy your int in your struct, but as said in another answer, this is not portable... and quite ugly.
Do you really care adding separately each field in a separate method?
Do you want to keep the integer for performance reason?
// add two Fixed
Fixed operator+( Fixed a, Fixed b )
{
...
}
//add Fixed and int
Fixed operator+( Fixed a, int b )
{
...
}
You may cast any addressable type to another one by using:
*(newtype *)&var