Using unions to simplify casts - c

I realize that what I am trying to do isn't safe. But I am just doing some testing and image processing so my focus here is on speed.
Right now this code gives me the corresponding bytes for a 32-bit pixel value type.
struct Pixel {
unsigned char b,g,r,a;
};
I wanted to check if I have a pixel that is under a certain value (e.g. r, g, b <= 0x10). I figured I wanted to just conditional-test the bit-and of the bits of the pixel with 0x00E0E0E0 (I could have wrong endianness here) to get the dark pixels.
Rather than using this ugly mess (*((uint32_t*)&pixel)) to get the 32-bit unsigned int value, i figured there should be a way for me to set it up so I can just use pixel.i, while keeping the ability to reference the green byte using pixel.g.
Can I do this? This won't work:
struct Pixel {
unsigned char b,g,r,a;
};
union Pixel_u {
Pixel p;
uint32_t bits;
};
I would need to edit my existing code to say pixel.p.g to get the green color byte. Same happens if I do this:
union Pixel {
unsigned char c[4];
uint32_t bits;
};
This would work too but I still need to change everything to index into c, which is a bit ugly but I can make it work with a macro if i really needed to.

(Edited) Both gcc and MSVC allow 'anonymous' structs/unions, which might solve your problem. For example:
union Pixel {
struct {unsigned char b,g,r,a;};
uint32_t bits; // use 'unsigned' for MSVC
}
foo.b = 1;
foo.g = 2;
foo.r = 3;
foo.a = 4;
printf ("%08x\n", foo.bits);
gives (on Intel):
04030201
This requires changing all your declarations of struct Pixel to union Pixel in your original code. But this defect can be fixed via:
struct Pixel {
union {
struct {unsigned char b,g,r,a;};
uint32_t bits;
};
} foo;
foo.b = 1;
foo.g = 2;
foo.r = 3;
foo.a = 4;
printf ("%08x\n", foo.bits);
This also works with VC9, with 'warning C4201: nonstandard extension used : nameless struct/union'. Microsoft uses this trick, for example, in:
typedef union {
struct {
DWORD LowPart;
LONG HighPart;
}; // <-- nameless member!
struct {
DWORD LowPart;
LONG HighPart;
} u;
LONGLONG QuadPart;
} LARGE_INTEGER;
but they 'cheat' by suppressing the unwanted warning.
While the above examples are ok, if you use this technique too often, you'll quickly end up with unmaintainable code. Five suggestions to make things clearer:
(1) Change the name bits to something uglier like union_bits, to clearly indicate something out-of-the-ordinary.
(2) Go back to the ugly cast the OP rejected, but hide its ugliness in a macro or in an inline function, as in:
#define BITS(x) (*(uint32_t*)&(x))
But this would break the strict aliasing rules. (See, for example, AndreyT's answer: C99 strict aliasing rules in C++ (GCC).)
(3) Keep the original definiton of Pixel, but do a better cast:
struct Pixel {unsigned char b,g,r,a;} foo;
// ...
printf("%08x\n", ((union {struct Pixel dummy; uint32_t bits;})foo).bits);
(4) But that is even uglier. You can fix this by a typedef:
struct Pixel {unsigned char b,g,r,a;} foo;
typedef union {struct Pixel dummy; uint32_t bits;} CastPixelToBits;
// ...
printf("%08x\n", ((CastPixelToBits)foo).bits); // not VC9
With VC9, or with gcc using -pedantic, you'll need (don't use this with gcc--see note at end):
printf("%08x\n", ((CastPixelToBits*)&foo)->bits); // VC9 (not gcc)
(5) A macro may perhaps be preferred. In gcc, you can define a union cast to any given type very neatly:
#define CAST(type, x) (((union {typeof(x) src; type dst;})(x)).dst) // gcc
// ...
printf("%08x\n", CAST(uint32_t, foo));
With VC9 and other compilers, there is no typeof, and pointers may be needed (don't use this with gcc--see note at end):
#define CAST(typeof_x, type, x) (((union {typeof_x src; type dst;}*)&(x))->dst)
Self-documenting, and safer. And not too ugly. All these suggestions are likely to compile to identical code, so efficiency is not an issue. See also my related answer: How to format a function pointer?.
Warning about gcc: The GCC Manual version 4.3.4 (but not version 4.3.0) states that this last example, with &(x), is undefined behaviour. See http://davmac.wordpress.com/2010/01/08/gcc-strict-aliasing-c99/ and http://gcc.gnu.org/ml/gcc/2010-01/msg00013.html.

The problem with a structure inside a union, is that the compiler is allowed to add padding bytes between members of a structure (or class), except bit fields.
Given:
struct Pixel
{
unsigned char red;
unsigned char green;
unsigned char blue;
unsigned char alpha;
};
This could be laid out as:
Offset Field
------ -----
0x00 red
0x04 green
0x08 blue
0x0C alpha
So the size of the structure would be 16 bytes.
When put in a union, the compiler would take the larger capacity of the two to determine space. Also, as you can see, a 32 bit integer would not align correctly.
I suggest creating functions to combine and extract pixels from a 32-bit quantity. You can declare it inline too:
void Int_To_Pixel(const unsigned int word,
Pixel& p)
{
p.red = (word & 0xff000000) >> 24;
p.blue = (word & 0x00ff0000) >> 16;
p.green = (word & 0x0000ff00) >> 8;
p.alpha = (word & 0x000000ff);
return;
}
This is a lot more reliable than a struct inside a union, including one with bit fields:
struct Pixel_Bit_Fields
{
unsigned int red::8;
unsigned int green::8;
unsigned int blue::8;
unsigned int alpha::8;
};
There is still some mystery when reading this whether red is the MSB or alpha is the MSB. By using bit manipulation, there is no question when reading the code.
Just my suggestions, YMMV.

Why not make the ugly mess into an inline routine? Something like:
inline uint32_t pixel32(const Pixel& p)
{
return *reinterpret_cast<uint32_t*>(&p);
}
You could also provide this routine as a member function for Pixel, called i(), which would allow you to access the value via pixel.i() if you preferred to do it that way. (I'd lean on separating the functionality from the data structure when invariants need not be enforced.)

Related

sizeof anonymous nested struct

Suppose I have structure I'm using to model various packet formats:
#define MaxPacket 20
typedef struct {
u8 packetLength;
union {
u8 bytes[MaxPacket];
struct {
u16 field1;
u16 field2;
u16 field3;
} format1;
struct {
double value1;
double value2;
} format2;
};
} Packet;
I can expect that sizeof(Packet) will be 21. But is there any way to do something like:
sizeof(Packet.format2)
? I've tried that, but the compiler is not happy. Obviously, I could pull the format1 out as a separate typedef and then I could sizeof(format1). But I'm curious if I have to through all of that. I like the hierarchical composition of the formats. This is with gcc on an 8bit processor.
I'm equally interested if there's a way to use the nested type. IF I have to do a lot of
aPacketPointer->format2.value1; // not so onerous, but if the nesting gets deeper...
Then sometimes it would be nice to do:
Packet.format2 *formatPtr = &aPacketPointer->format2;
formatPtr->value2; // etc
Again, refactoring into a bunch of preceding typedefs would solve this problem, but then I lose the nice namespacing effect of the nested dotted references.
For something that will work even in C90, you can use a macro modeled on your toolchain's offsetof() macro:
#define sizeof_field(s,m) (sizeof((((s*)0)->m)))
Adjust it accordingly if your toolchain's offsetof() macro isn't based on casting 0 to a pointer to the structure's type.
When I use it like so:
std::cout << sizeof_field(Packet,format1) << std::endl;
std::cout << sizeof_field(Packet,format2) << std::endl;
I get the output:
6
16
For your second question, if you're willing to rely on GCC's typeof extension you can create a similar macro for declaring pointers to your nested anonymous structs:
#define typeof_field(s,m) typeof(((s*)0)->m)
...
typeof_field(Packet,format2)* f2 = &foo.format2;
To be honest, I find that construct pretty ugly, but it might still be better than other options you have available.
GCC documents that the "operand of typeof is evaluated for its side effects if and only if it is an expression of variably modified type or the name of such a type", so the apparent null pointer deference should not result in undefined behavior when a variable length array is not involved.
Using C11 or C99, create a dummy compound literal and seek its size.
printf("%zu\n", sizeof( ((Packet){ 0, { "" }}).format2 ));
Output
16
You can just give those nested structs a name, no need for a typedef. Like this:
typedef struct {
u8 packetLength;
union {
u8 bytes[MaxPacket];
struct myformat1 {
u16 field1;
u16 field2;
u16 field3;
} format1;
struct myformat2 {
double value1;
double value2;
} format2;
};
} Packet;
Then you can write e.g. sizeof(struct myformat1), declare variables of that type, etc.
You could also add a typedef afterwards, e.g.
typedef struct myformat1 myformat1;

Struct data alignment. Size must be an integer multiple of the largest type present?

I am making a GUI that send/receives data over a serial port. The data consists of messages that are defined in a struct like this:
typedef struct
{
uint8_t a;
uint8_t b;
uint8_t c;
uint16_t d[3];
uint16_t e;
} MyMsg_t;
I am also using a union, because it makes it easier for me to be able to set the data fields and be able to send it byte for byte. The union looks like this:
typedef union
{
MyMsg_t msg;
uint8_t array[MyMsgLength];
} MyMsg;
I now try to add some data to a message like this:
MyMsg msg;
msg.msg.a = (uint8_t) 1;
msg.msg.b = (uint8_t) 2;
msg.msg.c = (uint8_t) 3;
msg.msg.d[0] = (uint16_t) 4;
msg.msg.d[1] = (uint16_t) 5;
msg.msg.d[2] = (uint16_t) 6;
msg.msg.e = (uint16_t) 7;
And I transmit it over a serial bus byte-wise and the receiving end is:
1 2 3 19 4 0 5 0 6 0 7
(the data in c through e is reversed because of the bus)
This looks like the struct actually was:
typedef struct
{
uint8_t a; //1
uint8_t b; //2
uint8_t c; //3
//uint8_t x //19
uint16_t d[3];
uint16_t e;
} MyMsg_t;
From this I can assume that it somewhere in the C standard says that the struct must be minimum n * sizeof(uint16_t) in this case as we can not have for example 3.5 of the largest type in a struct, but it has to be an integer?
I guess this is what is called padding? Is there some way to force a struct to be n * sizeof(uint8_t) even when a larger type is present?
I know how I can avoid this, but it requires not using the union and more code. Is there some way to elegantly avoid this issue with minimal code intervention?
Don't forget about the endianness if the machine too, which is why doing this is not a recommended practice. If you don't care about endianness or are dealing with it some other way, then this approach is acceptable.
You can change the alignment to eliminate padding. How this is done depends on the compiler, however:
Microsoft Visual C++ - #pragma pack
gcc - __attribute__(packed), also supports #pragma pack
Other compilers may have other options or means of accomplishing the same thing.

using memcpy for structs

I have a problem when using memcpy on a struct.
Consider the following struct
struct HEADER
{
unsigned int preamble;
unsigned char length;
unsigned char control;
unsigned int destination;
unsigned int source;
unsigned int crc;
}
If I use memcpy to copy data from a receive buffer to this struct the copy is OK, but if i redeclare the struct to the following :
struct HEADER
{
unsigned int preamble;
unsigned char length;
struct CONTROL control;
unsigned int destination;
unsigned int source;
unsigned int crc;
}
struct CONTROL
{
unsigned dir : 1;
unsigned prm : 1;
unsigned fcb : 1;
unsigned fcb : 1;
unsigned function_code : 4;
}
Now if I use the same memcpy code as before, the first two variables ( preamble and length ) are copied OK. The control is totally messed up, and last three variables are shifted one up, aka crc = 0, source = crc, destination = source...
ANyone got any good suggestions for me ?
Do you know that the format in the receive buffer is correct, when you add the control in the middle?
Anyway, your problem is that bitfields are the wrong tool here: you can't depend on the layout in memory being anything in particular, least of all the exact same one you've chosen for the serialized form.
It's almost never a good idea to try to directly copy structures to/from external storage; you need proper serialization. The compiler can add padding and alignment between the fields of a structure, and using bitfields makes it even worse. Don't do this.
Implement proper serialization/deserialization functions:
unsigned char * header_serialize(unsigned char *put, const struct HEADER *h);
unsigned char * header_deserialize(unsigned char *get, struct HEADER *h);
That go through the structure and read/write as many bytes as you feel are needed (possibly for each field):
static unsigned char * uint32_serialize(unsigned char *put, uint32_t x)
{
*put++ = (x >> 24) & 255;
*put++ = (x >> 16) & 255;
*put++ = (x >> 8) & 255;
*put++ = x & 255;
return put;
}
unsigned char * header_serialize(unsigned char *put, const struct HEADER *h)
{
const uint8_t ctrl_serialized = (h->control.dir << 7) |
(h->control.prm << 6) |
(h->control.fcb << 5) |
(h->control.function_code);
put = uint32_serialize(put, h->preamble);
*put++ = h->length;
*put++ = ctrl_serialized;
put = uint32_serialize(put, h->destination);
put = uint32_serialize(put, h->source);
put = uint32_serialize(put, h->crc);
return put;
}
Note how this needs to be explicit about the endianness of the serialized data, which is something you always should care about (I used big-endian). It also explicitly builds a single uint8_t version of the control fields, assuming the struct version was used.
Also note that there's a typo in your CONTROL declaration; fcb occurs twice.
Using struct CONTROL control; instead of unsigned char control; leads to a different alignment inside the struct and so filling it with memcpy() produces a different result.
Memcpy copies the values of bytes from the location pointed by source directly to the memory block pointed by destination.
The underlying type of the objects pointed by both the source and destination pointers are irrelevant for this function; The result is a binary copy of the data.
So if there is any structure padding then you will have messed up results.
Check sizeof(struct CONTROL) -- I think it would be 2 or 4 depending on the machine. Since you are using unsigned bitfields (and unsigned is shorthand of unsigned int), the whole structure (struct CONTROL) would take at least the size of unsigned int -- i.e. 2 or 4 bytes.
And, using unsigned char control takes 1 byte for this field. So, definitely there should be mismatch staring with the control variable.
Try rewriting the struct control as below:-
struct CONTROL
{
unsigned char dir : 1;
unsigned char prm : 1;
unsigned char fcb : 1;
unsigned char fcb : 1;
unsigned char function_code : 4;
}
The clean way would be to use a union, like in.:
struct HEADER
{
unsigned int preamble;
unsigned char length;
union {
unsigned char all;
struct CONTROL control;
} uni;
unsigned int destination;
unsigned int source;
unsigned int crc;
};
The user of the struct can then choose the way he wants to access the thing.
struct HEADER thing = {... };
if (thing.uni.control.dir) { ...}
or
#if ( !FULL_MOON ) /* Update: stacking of bits within a word appears to depend on the phase of the moon */
if (thing.uni.all & 1) { ... }
#else
if (thing.uni.all & 0x80) { ... }
#endif
Note: this construct does not solve endianness issues, that will need implicit conversions.
Note2: and you'll have to check the bit-endianness of your compiler, too.
Also note that bitfields are not very useful, especially if the data goes over the wire, and the code is expected to run on different platforms, with different alignment and / or endianness. Plain unsigned char or uint8_t plus some bitmasking yields much cleaner code. For example, check the IP stack in the BSD or linux kernels.

What is a better way to declare flags on an arm based arch?

What is better for an arm based arch ?
struct my_struct{
struct device *dev;
unsigned char a:1,
b:1,
v:1,
d:1;
};
Or to define a char and use bit wise operations:
struct my_struct{
struct device *dev;
unsigned char abcd;
};
I don't think the architecture of the processor matters for this question. In many respects it is purely a style question.
In my experience most people tend to use unsigned integers and bit-wise operations:
#define MASK_B 0x20;
unsigned char field;
int b_is_set = field & MASK_B;
The bitfields never really seemed to take off in mainstream code.
That being said, I would use whichever feels more natural to you.
Bit-field packing order is implementation-defined. It means that if you declare
struct my_struct {
struct device *dev;
unsigned char a:1,
b:1,
c:1,
d:1;
};
it is completely up to the compiler to decide where a or b or c or d actually reside in. (GCC and some other compilers do define __BIG_ENDIAN_BITFIELD or __LITTLE_ENDIAN_BITFIELD depending on how they pack bitfields, though.)
On the other hand, if you define
struct my_struct {
struct device *dev;
unsigned char abcd;
};
#define MASK_A (1U << 0U)
#define MASK_B (1U << 1U)
#define MASK_C (1U << 2U)
#define MASK_D (1U << 3U)
static inline unsigned char get(const struct my_struct *const m, const unsigned char mask)
{
return m->abcd & mask;
}
static inline unsigned char set(struct my_struct *const m, const unsigned char mask, const unsigned char value)
{
m->abcd = (m->abcd & mask) | (mask & value);
return m->abcd & mask;
}
static inline unsigned char flip(struct my_struct *const m, const unsigned char mask)
{
m->abcd ^= mask;
return m->abcd & mask;
}
you know that a maps to the least significant bit in the byte following the pointer, b to the second bit, c third, and d fourth.
If your C compiler supports static inline, then these functions are as fast as macros, but don't have the issues macros have wrt. side effects.
This also allows you to manipulate the bit fields as a group. For example, to set b in structure t you'd use set(&t, MASK_B, MASK_B). To set b but clear a, you'd use set(&t, MASK_A | MASK_B, MASK_B). To test if a is set, use get(&t, MASK_A). To test if a or b is set, use get(&t, MASK_A | MASK_B). To check if both a and b are set, use (get(&t, MASK_A | MASK_B)) == (MASK_A | MASK_B)). All three functions return the resulting bit mask, with the mask applied, i.e. all other bits zero.
Personally, I prefer this latter approach, mostly because I feel it is more explicit (I am fully in control), more versatile (allows me to manipulate them not only individually, but also in groups), and more space efficient (since compilers tend to add padding, unless you explicitly tell them not to via e.g. command-line switches). Depending on the way you use the flags, I recommend you access them via macros or static inline functions.
That said, if the structure is internal to the application, never stored in mass storage or transmitted, I would not have any real objections to either approach.
Very first thing to do is thinking about alignment, read more about those issues unaligned-memory-access and mem_alignment .
Scalability or overhead is second very important thing. If this structure will be allocated many times, for example caching entries or using it with structures like file system internals then you need to keep things compact.
Readability is also very important too, but since this is an OS kernel we are talking about which you need to provide performance as well as maintenance, it is a trade off you should consciously make.

Tips on redefining a register bitfield in C

I am struggling trying to come up with a clean way to redefine some register bitfields to be usable on a chip I am working with.
For example, this is what one of the CAN configuration registers is defined as:
extern volatile near unsigned char BRGCON1;
extern volatile near struct {
unsigned BRP0:1;
unsigned BRP1:1;
unsigned BRP2:1;
unsigned BRP3:1;
unsigned BRP4:1;
unsigned BRP5:1;
unsigned SJW0:1;
unsigned SJW1:1;
} BRGCON1bits;
Neither of these definitions is all that helpful, as I need to assign the BRP and SJW like the following:
struct
{
unsigned BRP:6;
unsigned SJW:2;
} GoodBRGbits;
Here are two attempts that I have made:
Attempt #1:
union
{
byte Value;
struct
{
unsigned Prescaler:6;
unsigned SynchronizedJumpWidth:2;
};
} BaudRateConfig1 = {NULL};
BaudRateConfig1.Prescaler = 5;
BRGCON1 = BaudRateConfig1.Value;
Attempt #2:
static volatile near struct
{
unsigned Prescaler:6;
unsigned SynchronizedJumpWidth:2;
} *BaudRateConfig1 = (volatile near void*)&BRGCON1;
BaudRateConfig1->Prescaler = 5;
Are there any "cleaner" ways to accomplish what I am trying to do? Also I am slightly annoyed about the volatile near casting in Attempt #2. Is it necessary to specify a variable is near?
Personally, I try to avoid using using bit fields for portability reasons. Instead, I tend to use bit masks so that I can explicitly control which bits are used.
For example (assuming the bit order is correct) ...
#define BRP0 0x80
#define BRP1 0x40
#define BRP2 0x20
#define BRP3 0x10
#define BRP4 0x08
#define BRP5 0x04
#define SJW0 0x02
#define SJW1 0x01
Masks can then be generated as appropriate and values assigned or read or tested. Better names for the macros can be picked by you.
Hope this helps.
I suggest that you dont mix up the bitfield declaration with the adressing of the hardware register.
Your union/struct declares how the bitfields are arranged, then you specify addressing and access restrictions when declaring a pointer to such a structure.
// foo.h
// Declare struct, declare pointer to hw reg
struct com_setup_t {
unsigned BRP:6;
unsigned SJW:2;
};
extern volatile near struct com_setup_t *BaudRateConfig1;
// foo.c
// Initialise pointer
volatile near struct com_setup_t *BaudRateConfig1 =
(volatile near struct com_setup_t *)0xfff...;
// access hw reg
foo() {
...
BaudRateConfig1->BRP = 3;
...
}
Regarding near/far I assume that the default is near unless far is specified, unless you can set the default pointer size to far using compiler switches.

Resources