Does this macro violate the STRICT ALIASING RULE? - c

I'm fixing some code not written by me, so I found this:
#define get_u_int16_t(X,O) (*(u_int16_t *)(((u_int8_t *)X) + O))
How can I change it to keep the rule, if it violating it ?
The macro is called in this way:
if(get_u_int16_t(packet->payload, i) == ...) { ... }
where payload is a const unsigned char * and i is an unsigned int .
The situation is:
struct orig {
[...]
struct pkt packet;
}*;
struct pkt {
[...]
const u_int8_t *payload;
}*;
Called in this way:
struct orig * flow;
struct pkt * packet = &flow->packet;
payload is a string
i begins with a value of 0 and it is inside a for that loop for the lenght of payload ( u_int16_t len ):
for(i = 0; i < len; i++) {
if(get_u_int16_t(packet->payload, a) == /*value*/) {
// do stuff
}

The macro itself doesn't violate the strict aliasing rule; it depends how you use it. If you only use it to read already existing objects of type u_int16_t or a compatible type then it's fine; if on the other hand you use it to read e.g. parts of a 64-bit integer or a floating-point object then that would be a strict aliasing violation, as well as (possibly) an alignment violation.
As always, you can make the code safe using memcpy:
inline u_int16_t read_u_int16_t(const void *p) {
u_int16_t val;
memcpy(&val, p, sizeof(val));
return val;
}
#define get_u_int16_t(X,O) (read_u_int16_t(((u_int8_t *)X) + O))
As #2501 points out this may be invalid if u_int8_t is not a character type, so you should just use char * for pointer arithmetic:
#define get_u_int16_t(X,O) (read_u_int16_t(((char *)X) + O))

There are two ways to write code of the type you're interested in:
Use pointers of type "unsigned char*" to access anything, assemble values out of multiple bytes, and tolerate the performance hit.
If you know that pointer alignment won't be an issue, use a dialect of C that doesn't semantically gut it. GCC can implement such a dialect if invoked with -fno-strict-aliasing, but since gcc has no way to prevent obtuse "optimizations" without blocking useful and sensible optimizations, getting good performance from such a dialect may require learning how to use restrict.
Some people would argue that approach #1 is better, and for many PC-side programming purposes I'd tend to agree. For embedded software, however, I would suggest staying away from gcc's default dialect since there's no guarantee that the things gcc regards as defined behavior today will remain so tomorrow. I'm not sure where the authors of gcc got the notion that the authors of the Standard were trying to specify all of the forms of aliasing that a quality microprocessor implementation should support, rather than establish a minimum baseline for platforms where no forms of type punning other than "unsigned char" would be useful.

Related

Strict aliasing and casting union pointers

I have looked around this site to try to figure out if my use of casting to different unions is violating strict aliasing or otherwise UB.
I have packets coming in on a serial line and I store/get them like:
union uart_data {
struct {
uint8_t start;
uint8_t addr;
uin16_t length;
uint8_t data[];
};
uint8_t bytes[BUFFER_SIZE];
};
void store_byte(uint8_t byte) {
uart_data->start = byte;
/* and so on with the other named fields. */
}
uint8_t * get_buffer() {
return uart_data->bytes;
}
My understanding is that this is, at least with GCC and GNU extensions an valid way to do type punning.
However, I then want to cast the return value from get_buffer() to a more specific type of packet that the uart doesn't need to know the details about.
union spec_pkt {
struct {
uint8_t start;
uint8_t addr;
uin16_t length;
uint8_t command;
uint8_t some_field;
uint16_t data_length;
uint8_t data[];
};
uint8_t bytes[BUFFER_SIZE];
};
void process(uint8_t *data) {
union specific_pkt *pkt = (union specific_pkt *)data;
}
I recall having read somewhere that this is valid since I'm casting from a type that exists in the union but I can't find the source.
My rationale for doing this it this way is that I can have a uart driver that only needs to know about the lowest level details. I'm on an MCU so I only have access to pre-allocated buffers to data and this way I don't have to memcpy between buffers, wasting space. And in my application code I can handle the packet in a nicer way than:
uint8_t data[BUFFER_SIZE];
data[START_POS];
data[LEN_POS];
data[DATA_POS];
If this is violating the SA rule or is UB I'd love some alternatives to achieve the same.
I'm using GCC on a target that supports unaligned access and GCC allows type punning through unions.
The Standard completely fails to specify the circumstances under which a structure or union object may be accessed via a non-character lvalue whose type is not that of the structure or union. If one recognizes that the purpose of the Standard is to purely indicate when a compiler must recognize that an object is being accessed by a seemingly-unrelated lvalue, but is not meant to apply to situations where a compiler would be able to see that an lvalue or pointer of one type is used to derive another which is then used to access storage associated with the first, without any intervening conflicting action on that storage, this omission would make sense. For example, given:
struct sizedPointer { int length,size; int *dat; };
void storeThing(struct sizedPointer *dest, int n)
{
if (dest->length < dest->size)
{
dest->dat[dest->length] = n;
dest->length++;
}
}
such an interpretation would allow a compiler to assume that dest->length will not be written using dest->dat, since its value has been observed after dest->dat was formed, but would require that a compiler recognize that given:
union blob { uint16_t hh[8]; uint64_t oo[2]; } myBblob;
an operation like
sscanf(someString, "%4x", &myBlob.hh[1]);
might interact with any lvalues that are derived from myBlob after the function returns.
Unfortunately, gcc and clang instead interpret the rule as only mandating recognition in cases where failure to do so would completely gut the language. Because the Standard doesn't mandate that member-type lvalues be usable in any fashion whatsoever, and gcc and clang have explicitly stated that they should not be relied upon to do anything beyond what the Standard requires, support for anything useful should be viewed as being at the whim of the maintainers of clang and gcc.

MISRA compliant run-time detection of endianness

(First note that I know determining endianness at run-time is not an ideal solution and there are better ideas. Please don't bring that up)
I need to check the endianness of my CPU at run-time. I also have to do it while staying MISRA-compliant. I'm using C99.
MISRA doesn't allow conversion between different types of pointers, so simply casting a uint32_t* to uint8_t* and de-referencing to see what value the uint8_t holds is not allowed. Using unions is also out of the question (MISRA doesn't allow unions).
I also attempted to use memcmp like in the following piece of code:
static endi get_endianess(void)
{
uint32_t a = 1U;
uint8_t b = 1U;
return memcmp(&a, &b, 1) == 0 ? endi_little : endi_big;
}
but MISRA says that The pointer arguments to the Standard Library function 'memcmp' are not pointers to qualified or unqualified versions of compatible types, meaning I've failed to out-smart it by converting to legal void* pointers and letting memcmp do the dirty work.
Any other clever ideas will be appreciated. If you don't have a MISRA checker, just send me your idea and I'll let you know what my checker says
I think you have misunderstood the MISRA-C rules. Code such as this is fine:
uint16_t u16 = 0xAABBu;
bool big_endian = *(uint8_t*)&u16 == 0xAAu;
MISRA-C:2012 rule 11.3 has an exception allowing pointer conversions to pointer to character types (which uint8_t can safely be regarded as), but not the other way around. The purpose of the rule is to protect against misaligned access and strict aliasing bugs.
Also, MISRA allows union just fine, the rule against it is advisory, just to force people to stop and think how they are using unions. MISRA does not allow union for the sake of storing multiple unrelated things in the same memory area, such as creating variants and other such nonsense. But controlled type punning, where padding/alignment and endianess has been considered, can be used with MISRA. That is, if you don't like this advisory rule. Personally I always ignore it in my MISRA implementations.
In a MISRA context, I suppose this header and this function might not be available, but:
#include <arpa/inet.h>
static endi get_endianness(void)
{
return htons(0x0001u) == 0x0001u ? endi_big : endi_little;
}

How to safely perform type-punning in embedded system

Our team is currently using some ported code from an old architecture to a new product based on the ARM Cortex M3 platform using a customized version of GCC 4.5.1. We are reading data from a communications link, and attempting to cast the raw byte array to a struct to cleanly parse the data. After casting the pointer to a struct and dereferencing, we are getting a warning: "dereferencing type-punned pointer will break strict-aliasing rules".
After some research, I've realized that since the char array has no alignment rules and the struct have to be word aligned, casting the pointers causes undefined behavior (a Bad Thing). I'm wondering if there is a better way to do what we're trying.
I know we can explicitly word-align the char array using GCC's "attribute ((aligned (4)))". I believe this will make our code "safer", but the warnings will still clutter up our builds, and I don't want to disable the warnings in case this situation arises again. What we want is a way to safely do what we are trying, that will still inform us if we attempt to do something unsafe in another place later. Since this is an embedded system, RAM usage and flash usage are important to some degree.
Portability (compiler and architecture) is not a huge concern, this is just for one product. However, if a portable solution exists, it would be preferred.
Here is the a (very simplified) example of what we are currently doing:
#define MESSAGE_TYPE_A 0
#define MESSAGE_TYPE_B 1
typedef struct MessageA __attribute__((__packed__))
{
unsigned char messageType;
unsigned short data1;
unsigned int data2;
}
typedef struct MessageB __attribute__((__packed__))
{
unsigned char messageType;
unsigned char data3;
unsigned char data4;
}
// This gets filled by the comm system, assume from a UART interrupt or similar
unsigned char data[100];
// Assume this gets called once we receive a full message
void ProcessMessage()
{
MessageA* messageA;
unsigned char messageType = data[0];
if (messageType == MESSAGE_TYPE_A)
{
// Cast data to struct and attempt to read
messageA = (MessageA*)data; // Not safe since data may not be word aligned
// This may cause undefined behavior
if (messageA->data1 == 4) // warning would be here, when we use the data at the pointer
{
// Perform some action...
}
}
// ...
// process different types of messages
}
As has already been pointed out, casting pointers about is a dodgy practice.
Solution: use a union
struct message {
unsigned char messageType;
union {
struct {
int data1;
short data2;
} A;
struct {
char data1[5];
int data2;
} B;
} data;
};
void func (...) {
struct message msg;
getMessage (&msg);
switch (msg.messageType) {
case TYPEA:
doStuff (msg.data.A.data1);
break;
case TYPEB:
doOtherStuff (msg.data.B.data1);
break;
}
}
By this means the compiler knows you're accessing the same data via different means, and the warnings and Bad Things will go away.
Of coure, you'll need to make sure the structure alignment and packing matches your message format. Beware endian issues and such if the machine on the other end of the link doesn't match.
Type punning through cast of types different than char * or a pointer to a signed/unsigned variant of char is not strictly conforming as it violates C aliasing rules (and sometimes alignment rules if no care is given).
However, gcc permits type punning through union types. Manpage of gcc explicitly documents it:
The practice of reading from a different union member than the one most recently written to (called "type-punning") is common. Even with
-fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type.
To disable optimizations related to aliasing rules with gcc (and thus allow the program to break C aliasing rules), the program can be compiled with: -fno-strict-aliasing. Note that with this option enabled, the program is no longer strictly conforming, but you said portability is not a concern. For information, the Linux kernel is compiled with this option.
GCC has a -fno-strict-aliasing flag that will disable strict-aliasing-based optimizations and make your code safe.
If you're really looking for a way to "fix" it, you have to rethink the way your code works. You can't just overlay the structure the way you're trying, so you need to do something like this:
MessageA messageA;
messageA.messageType = data[0];
// Watch out - endianness and `sizeof(short)` dependent!
messageA.data1 = (data[1] << 8) + data[2];
// Watch out - endianness and `sizeof(int)` dependent!
messageA.data2 = (data[3] << 24) + (data[4] << 16)
+ (data[5] << 8) + data[6];
This method will let you avoid packing your structure, which might also improve its performance characteristics elsewhere in your code. Alternately:
MessageA messageA;
memcpy(&messageA, data, sizeof messageA);
Will do it with your packed structures. You would do the reverse operations to translate the structures back into a flat buffer if necessary.
Stop using packed structures and memcpy the individual fields into variables of the correct size and type. This is the safe, portable, clean way to do what you're trying to achieve. If you're lucky, gcc will optimize the tiny fixed-size memcpy into a few simple load and store instructions.
The Cortex M3 can handle unaligned accesses just fine. I have done this in similar packet processing systems with the M3. You don't need to do anything, you can just use the flag -fno-strict-aliasing to get rid of the warning.
For unaligned accesses, look at the linux macros get_unaligned/put_unaligned.

Is it possible to cast pointers from a structure type to another structure type extending the first in C?

If I have structure definitions, for example, like these:
struct Base {
int foo;
};
struct Derived {
int foo; // int foo is common for both definitions
char *bar;
};
Can I do something like this?
void foobar(void *ptr) {
((struct Base *)ptr)->foo = 1;
}
struct Derived s;
foobar(&s);
In other words, can I cast the void pointer to Base * to access its foo member when its type is actually Derived *?
You should do
struct Base {
int foo;
};
struct Derived {
struct Base base;
char *bar;
};
to avoid breaking strict aliasing; it is a common misconception that C allows arbitrary casts of pointer types: although it will work as expected in most implementations, it's non-standard.
This also avoids any alignment incompatibilities due to usage of pragma directives.
Many real-world C programs assume the construct you show is safe, and there is an interpretation of the C standard (specifically, of the "common initial sequence" rule, C99 §6.5.2.3 p5) under which it is conforming. Unfortunately, in the five years since I originally answered this question, all the compilers I can easily get at (viz. GCC and Clang) have converged on a different, narrower interpretation of the common initial sequence rule, under which the construct you show provokes undefined behavior. Concretely, experiment with this program:
#include <stdio.h>
#include <string.h>
typedef struct A { int x; int y; } A;
typedef struct B { int x; int y; float z; } B;
typedef struct C { A a; float z; } C;
int testAB(A *a, B *b)
{
b->x = 1;
a->x = 2;
return b->x;
}
int testAC(A *a, C *c)
{
c->a.x = 1;
a->x = 2;
return c->a.x;
}
int main(void)
{
B bee;
C cee;
int r;
memset(&bee, 0, sizeof bee);
memset(&cee, 0, sizeof cee);
r = testAB((A *)&bee, &bee);
printf("testAB: r=%d bee.x=%d\n", r, bee.x);
r = testAC(&cee.a, &cee);
printf("testAC: r=%d cee.x=%d\n", r, cee.a.x);
return 0;
}
When compiling with optimization enabled (and without -fno-strict-aliasing), both GCC and Clang will assume that the two pointer arguments to testAB cannot point to the same object, so I get output like
testAB: r=1 bee.x=2
testAC: r=2 cee.x=2
They do not make that assumption for testAC, but — having previously been under the impression that testAB was required to be compiled as if its two arguments could point to the same object — I am no longer confident enough in my own understanding of the standard to say whether or not that is guaranteed to keep working.
That will work in this particular case. The foo field in the first member of both structures and hit has the same type. However this is not true in the general case of fields within a struct (that are not the first member). Items like alignment and packing can make this break in subtle ways.
As you seem to be aiming at Object Oriented Programming in C I can suggest you to have a look at the following link:
http://www.planetpdf.com/codecuts/pdfs/ooc.pdf
It goes into detail about ways of handling oop principles in ANSI C.
In particular cases this could work, but in general - no, because of the structure alignment.
You could use different #pragmas to make (actually, attempt to) the alignment identical - and then, yes, that would work.
If you're using microsoft visual studio, you might find this article useful.
There is another little thing that might be helpful or related to what you are doing ..
#define SHARED_DATA int id;
typedef union base_t {
SHARED_DATA;
window_t win;
list_t list;
button_t button;
}
typedef struct window_t {
SHARED_DATA;
int something;
void* blah;
}
typedef struct window_t {
SHARED_DATA;
int size;
}
typedef struct button_t {
SHARED_DATA;
int clicked;
}
Now you can put the shared properties into SHARED_DATA and handle the different types via the "superclass" packed into the union.. You could use SHARED_DATA to store just a 'class identifier' or store a pointer.. Either way it turned out handy for generic handling of event types for me at some point. Hope i'm not going too much off-topic with this
I know this is an old question, but in my view there is more that can be said and some of the other answers are incorrect.
Firstly, this cast:
(struct Base *)ptr
... is allowed, but only if the alignment requirements are met. On many compilers your two structures will have the same alignment requirements, and it's easy to verify in any case. If you get past this hurdle, the next is that the result of the cast is mostly unspecified - that is, there's no requirement in the C standard that the pointer once cast still refers to the same object (only after casting it back to the original type will it necessarily do so).
However, in practice, compilers for common systems usually make the result of a pointer cast refer to the same object.
(Pointer casts are covered in section 6.3.2.3 of both the C99 standard and the more recent C11 standard. The rules are essentially the same in both, I believe).
Finally, you've got the so called "strict aliasing" rules to contend with (C99/C11 6.5 paragraph 7); basically, you are not allowed to access an object of one type via a pointer of another type (with certain exceptions, which don't apply in your example). See "What is the strict-aliasing rule?", or for a very in-depth discussion, read my blog post on the subject.
In conclusion, what you attempt in your code is not guaranteed to work. It might be guaranteed to always work with certain compilers (and with certain compiler options), and it might work by chance with many compilers, but it certainly invokes undefined behavior according to the C language standard.
What you could do instead is this:
*((int *)ptr) = 1;
... I.e. since you know that the first member of the structure is an int, you just cast directly to int, which bypasses the aliasing problem since both types of struct do in fact contain an int at this address. You are relying on knowing the struct layout that the compiler will use and you are still relying on the non-standard semantics of pointer casting, but in practice this is significantly less likely you give you problems.
The great/bad thing about C is that you can cast just about anything -- the problem is, it might not work. :) However, in your case, it will*, since you have two structs whose first members are both of the same type; see this program for an example. Now, if struct derived had a different type as its first element -- for example, char *bar -- then no, you'd get weird behavior.
* I should qualitfy that with "almost always", I suppose; there're a lot of different C compilers out there, so some may have different behavior. However, I know it'll work in GCC.

Disable structure padding in C without using pragma

How can I disable structure padding in C without using pragma?
There is no standard way of doing this. The standard states that padding may be done at the discretion of the implementation. From C99 6.7.2.1 Structure and union specifiers, paragraph 12:
Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
Having said that, there's a couple of things you can try.
The first you've already discounted, using #pragma to try and convince the compiler not to pack. In any case, this is not portable. Nor are any other implementation-specific ways but you should check into them as it may be necessary to do it if you really need this capability.
The second is to order your fields in largest to smallest order such as all the long long types followed by the long ones, then all the int, short and finally char types. This will usually work since it's most often the larger types that have the more stringent alignment requirements. Again, not portable.
Thirdly, you can define your types as char arrays and cast the addresses to ensure there's no padding. But keep in mind that some architectures will slow down if the variables aren't aligned properly and still others will fail miserably (such as raising a BUS error and terminating your process, for example).
That last one bears some further explanation. Say you have a structure with the fields in the following order:
char C; // one byte
int I; // two bytes
long L; // four bytes
With padding, you may end up with the following bytes:
CxxxIIxxLLLL
where x is the padding.
However, if you define your structure as:
typedef struct { char c[7]; } myType;
myType n;
you get:
CCCCCCC
You can then do something like:
int *pInt = &(n.c[1]);
int *pLng = &(n.c[3]);
int myInt = *pInt;
int myLong = *pLng;
to give you:
CIILLLL
Again, unfortunately, not portable.
All these "solutions" rely on you having intimate knowledge of your compiler and the underlying data types.
Other than compiler options like pragma pack, you cannot, padding is in the C Standard.
You can always attempt to reduce padding by declaring the smallest types last in the structure as in:
struct _foo {
int a; /* No padding between a & b */
short b;
} foo;
struct _bar {
short b; /* 2 bytes of padding between a & b */
int a;
} bar;
Note for implementations which have 4 byte boundaries
On some architectures, the CPU itself will object if asked to work on misaligned data. To work around this, the compiler could generate multiple aligned read or write instructions, shift and split or merge the various bits. You could reasonably expect it to be 5 or 10 times slower than aligned data handling. But, the Standard doesn't require compilers to be prepared to do that... given the performance cost, it's just not in enough demand. The compilers that support explicit control over padding provide their own pragmas precisely because pragmas are reserved for non-Standard functionality.
If you must work with unpadded data, consider writing your own access routines. You might want to experimenting with types that require less alignment (e.g. use char/int8_t), but it's still possible that e.g. the size of structs will be rounded up to multiples of 4, which would frustrate packing structures tightly, in which case you'll need to implement your own access for the entire memory region.
Either you let compiler do padding, or tell it not to do using #pragma, either you just use some bunch of bytes like a char array, and you build all your data by yourself (shifting and adding bytes). This is really inefficient but you'll exactly control the layout of the bytes. I did that sometimes preparing network packets by hand, but in most case it's a bad idea, even if it's standard.
If you really want structs without padding: Define replacement datatypes for short, int, long, etc., using structs or classes that are composed only of 8 bit bytes. Then compose your higher level structs using the replacement datatypes.
C++'s operator overloading is very convenient, but you could achieve the same effect in C using structs instead of classes. The below cast and assignment implementations assume the CPU can handle misaligned 32bit integers, but other implementations could accommodate stricter CPUs.
Here is sample code:
#include <stdint.h>
#include <stdio.h>
class packable_int { public:
int8_t b[4];
operator int32_t () const { return *(int32_t*) b; }
void operator = ( int32_t n ) { *(int32_t*) b = n; }
};
struct SA {
int8_t c;
int32_t n;
} sa;
struct SB {
int8_t c;
packable_int n;
} sb;
int main () {
printf ( "sizeof sa %d\n", sizeof sa ); // sizeof sa 8
printf ( "sizeof sb %d\n", sizeof sb ); // sizeof sb 5
return 0;
}
We can disable structure padding in c program using any one of the following methods.
-> use __attribute__((packed)) behind definition of structure. for eg.
struct node {
char x;
short y;
int z;
} __attribute__((packed));
-> use -fpack-struct flag while compiling c code. for eg.
$ gcc -fpack-struct -o tmp tmp.c
Hope this helps.
Thanks.

Resources