MISRA compliant run-time detection of endianness

MISRA compliant run-time detection of endianness - c

(First note that I know determining endianness at run-time is not an ideal solution and there are better ideas. Please don't bring that up)
I need to check the endianness of my CPU at run-time. I also have to do it while staying MISRA-compliant. I'm using C99.
MISRA doesn't allow conversion between different types of pointers, so simply casting a uint32_t* to uint8_t* and de-referencing to see what value the uint8_t holds is not allowed. Using unions is also out of the question (MISRA doesn't allow unions).
I also attempted to use memcmp like in the following piece of code:
static endi get_endianess(void)
{
uint32_t a = 1U;
uint8_t b = 1U;
return memcmp(&a, &b, 1) == 0 ? endi_little : endi_big;
}
but MISRA says that The pointer arguments to the Standard Library function 'memcmp' are not pointers to qualified or unqualified versions of compatible types, meaning I've failed to out-smart it by converting to legal void* pointers and letting memcmp do the dirty work.
Any other clever ideas will be appreciated. If you don't have a MISRA checker, just send me your idea and I'll let you know what my checker says

I think you have misunderstood the MISRA-C rules. Code such as this is fine:
uint16_t u16 = 0xAABBu;
bool big_endian = *(uint8_t*)&u16 == 0xAAu;
MISRA-C:2012 rule 11.3 has an exception allowing pointer conversions to pointer to character types (which uint8_t can safely be regarded as), but not the other way around. The purpose of the rule is to protect against misaligned access and strict aliasing bugs.
Also, MISRA allows union just fine, the rule against it is advisory, just to force people to stop and think how they are using unions. MISRA does not allow union for the sake of storing multiple unrelated things in the same memory area, such as creating variants and other such nonsense. But controlled type punning, where padding/alignment and endianess has been considered, can be used with MISRA. That is, if you don't like this advisory rule. Personally I always ignore it in my MISRA implementations.

In a MISRA context, I suppose this header and this function might not be available, but:
#include <arpa/inet.h>
static endi get_endianness(void)
{
return htons(0x0001u) == 0x0001u ? endi_big : endi_little;
}

Related

C: Portable way to define Array with 64-bit aligned starting address?

For code that is compiled on various/unknown architectures/compilers (8/16/32/64-bit) a global mempool array has to be defined:
uint8_t mempool[SIZE];
This mempool is used to store objects of different structure types e.g.:
typedef struct Meta_t {
uint16_t size;
struct Meta_t *next;
//and more
}
Since structure objects always have to be aligned to the largest possible boundary e.g. 64-byte it has to be ensured that padding bytes are added between those structure objects inside the mempool:
struct Meta_t* obj = (struct Meta_t*) mempool[123] + padding;
Meaning if a structure object would start on a not aligned address, the access to this would cause an alignment trap.
This works already well in my code. But I'm still searching for a portable way for aligning the mempool start address as well. Because without that, padding bytes have to be inserted already between the array start address and the address of the first structure inside the mempool.
The only way I have discovered so far is by defining the mempool inside a union together with another variable that will be aligned by the compiler anyways, but this is supposed be not portable.
Unfortunately for embedded platforms my code is also compiled with ANSI C90 compilers. In fact I cannot make any guess what compilers are exactly used. Because of this I'm searching for an absolutely portable solution and I guess any kind of preprocessor directives or compiler specific attributes or language features that were added after C90 cannot be used

You can use _Alignas, which is part of the C11 standard, to force a particular alignment.
_Alignas(uint64_t) uint8_t mempool[SIZE];

(struct Meta_t*) mempool will lead to undefined behavior for more reasons than alignment - it's also a strict aliasing violation.
The best solution might be to create a union such as this:
typedef union
{
struct Meta_t my_struct;
uint8_t bytes[sizeof(struct Meta_t)];
} thing;
This solves both the alignment and the pointer aliasing problems and works in C90.
Now if we do *(thing)mempool then this is well-defined since this (lvalue) access is done through a union type that includes an uint8_t array among its members. And type punning between union members is also well-defined in C. (No solution exists in C++.)

Unfortunately, this ...
This mempool is used to store objects of different structure types
... combined with this ...
I'm searching for an absolute portable solution and I guess any kind of preprocessor directives or compiler specific attributes or language features that were added after C90 cannot be used
... puts you absolutely up a creek, unless you know in advance all the structure types with which your memory pool may be used. Even the approach you are taking now does not conform strictly to C90, because there is no strictly-conforming way to determine the alignment of an address, so as to compute how much padding is needed.* (You have probably assumed that you can convert it to an integer and look at the least-significant bits, but C does not guarantee that you can determine anything about alignment that way.)
In practice, there is a variety of things that will work in a very wide range of target environments, despite not strictly conforming to the C language specification. Interpreting the result of converting a pointer to an integer as a machine address, so that it is sensible to use it for alignment computations, is one of those. For appropriately aligning a declared array, so would this be:
#define MAX(x,y) (((x) < (y)) ? (y) : (x))
union max_align {
struct d { long l; } l;
struct l { double d; } d;
struct p { void *p; } p;
unsigned char bytes[MAX(MAX(sizeof(struct d), sizeof(struct l)), sizeof(struct p))];
};
#undef MAX
#define MEMPOOL_BLOCK_SIZE sizeof(union max_align)
union maxalign mempool[(size + MEMPOOL_BLOCK_SIZE - 1) / MEMPOOL_BLOCK_SIZE];
For a very large set of C implementations, that not only ensures that the pool itself is properly aligned for any use by strictly-conforming C90 clients, but it also conveniently divides the pool into aligned blocks on which your allocator can draw. Refer also to #Lundin's answer for how pointers into that pool would need to be used to avoid strict aliasing violations.
(If you do happen to know all the types for which you must ensure alignment, then put one of each of those into union max_align instead of d, l, and p, and also make your life easier by having the allocator hand out pointers to union max_align instead of pointers to void or unsigned char.)
Overall, you need to choose a different objective than absolute portability. There is no such thing. Avoiding compiler extensions and language features added in C99 and later is a great start. Minimizing the assumptions you make about implementation behavior is important. And where that's not feasible, choose the most portable option you can come up with, and document it.
*Not to mention that you are relying on uint8_t, which is not in C90, and which is not necessarily provided even by all C99 and later implementations.

How to resolve MISRA C:2012 Rule 11.6?

I am utilizing Microchip sample nvmem.c file function to write data into particular memory address of PIC32 Microcontroller. When I am trying to use it showing following MISRA error: I just posted sample code where I got an error. My whole code is compiled and working fine.
1] explicit cast from 'unsigned int' to 'void ' [MISRA 2012 Rule 11.6, required] at NVMemWriteWord((void)APP_FLASH_MARK_ADDRESS,(UINT)_usermark);
How can I resolve this error?
nvmem.c
uint8_t NVMemWriteWord(void* address, uint32_t data)
{
uint8_t res;
NVMADDR = KVA_TO_PA((uint32_t)address); //destination address to write
NVMDATA = data;
res = NVMemOperation(NVMOP_WORD_PGM);
}
test.c
#define ADDRESS 0x9D007FF0U;
NVMemWriteWord((void*)ADDRESS,(uint32_t)_usermark);

Use
uint8_t NVMemWriteWord(unsigned int address, uint32_t data)
{
uint8_t res;
NVMADDR = KVA_TO_PA(address);
NVMDATA = data;
res = NVMemOperation(NVMOP_WORD_PGM);
}
and
#define ADDRESS 0x9D007FF0U
NVMemWriteWord(ADDRESS,(uint32_t)_usermark);
instead. Functionally it is exactly equivalent to the example, it just avoids the cast from a void pointer to an unsigned integer address.

Suggest:
#define ADDRESS (volatile uint32_t*)0x9D007FF0U
NVMemWriteWord( ADDRESS, _usermark) ;
Never cast to void* - the purpose of void* is that you can assign any other pointer type to it safely and without explicit cast. The cast of _usermark may or may not be necessary, but unnecessary explicit casts should be avoided - they can suppress important compiler warnings. You should approach type conversions in the following order of preference:
Type agreement - exactly same types.
Type compatibility - smaller type to larger type, same signedness.
Type case - last resort (e.g. larger to smaller type, signedness mismatch, integer to/from pointer).
In this instance since NVMemWriteWord simply casts address to an integer, then the use of void* may not be appropriate. If in other contexts you are actually using a pointer, then it may be valid.

The whole of MISRA-C:2012 chapter 12 regarding pointer conversions is quite picky. And rightly so, since this is very dangerous territory.
11.6 is a sound rule that bans conversions from integers to void*. The rationale is to block alignment bugs. There aren't many reasons why you would want to do such conversions anyway.
Notably, there's also two rigid but advisory rules 11.4 which bans conversions from integers to pointers, and 11.5 which pretty much bans the use of void* entirely. It isn't possible to do hardware-related programming and follow 11.4, so that rule has to be ignored. But you have little reason to use void*.
In this specific cast you can get away by using uint32_t and avoiding pointers entirely.
In the general case of register access, you must do a conversion with volatile-qualified pointers: (volatile uint32_t*)ADDRESS, assuming that the MCU uses 32 bit registers.

Strict aliasing rule and strlen implementation of glibc

I have been reading about the strict aliasing rule for a while, and I'm starting to get really confused. First of all, I have read these questions and some answers:
strict-aliasing-rule-and-char-pointers
when-is-char-safe-for-strict-pointer-aliasing
is-the-strict-aliasing-rule-really-a-two-way-street
According to them (as far as I understand), accessing a char buffer using a pointer to another type violates the strict aliasing rule. However, the glibc implementation of strlen() has such code (with comments and the 64-bit implementation removed):
size_t strlen(const char *str)
{
const char *char_ptr;
const unsigned long int *longword_ptr;
unsigned long int longword, magic_bits, himagic, lomagic;
for (char_ptr = str; ((unsigned long int) char_ptr
& (sizeof (longword) - 1)) != 0; ++char_ptr)
if (*char_ptr == '\0')
return char_ptr - str;
longword_ptr = (unsigned long int *) char_ptr;
himagic = 0x80808080L;
lomagic = 0x01010101L;
for (;;)
{
longword = *longword_ptr++;
if (((longword - lomagic) & himagic) != 0)
{
const char *cp = (const char *) (longword_ptr - 1);
if (cp[0] == 0)
return cp - str;
if (cp[1] == 0)
return cp - str + 1;
if (cp[2] == 0)
return cp - str + 2;
if (cp[3] == 0)
return cp - str + 3;
}
}
}
The longword_ptr = (unsigned long int *) char_ptr; line obviously aliases an unsigned long int to char. I fail to understand what makes this possible. I see that the code takes care of alignment problems, so no issues there, but I think this is not related with the strict aliasing rule.
The accepted answer for the third linked question says:
However, there is a very common compiler extension allowing you to cast properly aligned pointers from char to other types and access them, however this is non-standard.
Only thing comes to my mind is the -fno-strict-aliasing option, is this the case? I could not find it documented anywhere what glibc implementors depend on, and the comments somehow imply that this cast is done without any concerns like it is obvious that there will be no problems. That makes me think that it is indeed obvious and I am missing something silly, but my search failed me.

In ISO C this code would violate the strict aliasing rule. (And also violate the rule that you cannot define a function with the same name as a standard library function). However this code is not subject to the rules of ISO C. The standard library doesn't even have to be implemented in a C-like language. The standard only specifies that the implementation implements the behaviour of the standard functions.
In this case, we could say that the implementation is in a C-like GNU dialect, and if the code is compiled with the writer's intended compiler and settings then it would implement the standard library function successfully.

When writing the aliasing rules, the authors of the Standard only considered forms that would be useful, and should thus be mandated, on all implementations. C implementations are targeted toward a variety of purposes, and the authors of the Standard make no attempt to specify what a compiler must do to be suitable for any particular purpose (e.g. low-level programming) or, for that matter, any purpose whatsoever.
Code like the above which relies upon low-level constructs should not be expected to run on compilers that make no claim of being suitable for low-level programming. On the flip side, any compiler which can't support such code should be viewed as unsuitable for low-level programming. Note that compilers can employ type-based aliasing assumptions and still be suitable for low-level programming if they make a reasonable effort to recognize common aliasing patterns. Some compiler writers are very highly invested in a view of code which fits neither common low-level coding patterns, nor the C Standard, but
anyone writing low-level code should simply recognize that those compilers'
optimizers are unsuitable for use with low-level code.

The wording of the standard is actually a bit more weird than the actual compiler implementations: The C standard talks about declared object types, but the compilers only ever see pointers to these objects. As such, when a compiler sees a cast from a char* to an unsigned long*, it has to assume that the char* is actually aliasing an object with a declared type of unsigned long, making the cast correct.
A word of caution: I assume that strlen() is compiled into a library that is later only linked to the rest of the application. As such, the optimizer does not see the use of the function when compiling it, forcing it to assume that the cast to unsigned long* is indeed legit. If you called strlen() with
short myString[] = {0x666f, 0x6f00, 0};
size_t length = strlen((char*)myString); //implementation now invokes undefined behavior!
the cast within strlen() is undefined behavior, and your compiler would be allowed to strip pretty much the entire body of strlen() if it saw your use while compiling strlen() itself. The only thing that allows strlen() to behave as expected in this call is the fact, that strlen() is compiled separately as a library, hiding the undefined behavior from the optimizer, so the optimizer has to assume the cast to be legit when compiling strlen().
So, assuming that the optimizer cannot call "undefined behavior", the reason why casts from char* to anything else are dangerous, is not aliasing, but alignment. On some hardware, weird stuff starts happening if you try to access a misaligned pointer. The hardware might load data from the wrong address, raise an interrupt, or just process the requested memory load extremely slowly. That is why the C standard generally declares such casts undefined behavior.
Nevertheless, you see that the code in question actually handles the alignment issue explicitly (the first loop that contains the (unsigned long int) char_ptr & (sizeof (longword) - 1) subcondition). After that, the char* is properly aligned to be reinterpreted as unsigned long*.
Of course, all of this is not really compliant with the C standard, but it is compliant with the C implementation of the compiler that this code is meant to be compiled with. If the gcc people modified their compiler to act up on this bit of code, the glibc people would just complain about it loud enough so that the gcc will be changed back to handle this kind of cast correctly.
At the end of the day, standard C library implementations simply must violate strict aliasing rules to work properly and be efficient. strlen() just needs to violate those rules to be efficient, the malloc()/free() function pair must be able to take a memory region that had a declared type of Foo, and turn it into a memory region of declared type Bar. And there is no malloc() call inside the malloc() implementation that would give the object a declared type in the first place. The abstraction of the C language simply breaks down at this level.

The underlying assumption is probably that the function is separately compiled, and not available for inlining or other cross function optimizations. This means that no compile time information flows inside or outside the function.
The function doesn't try to modify anything through a pointer, so there is no conflict.

Can I do what I want with allocated memory

Are there limits on what I can do to allocated memory?(standard-wise)
For example
#include <stdio.h>
#include <stdlib.h>
struct str{
long long a;
long b;
};
int main(void)
{
long *x = calloc(4,sizeof(long));
x[0] = 2;
x[3] = 7;
//is anything beyond here legal( if you would exclude possible illegal operations)
long long *y = x;
printf("%lld\n",y[0]);
y[0] = 2;
memset (x,0,16);
struct str *bar = x;
bar->b = 4;
printf("%lld\n",bar->a);
return 0;
}
To summarize:
Can I recast the pointer to other datatypes and structs, as long as the size fits?
Can I read before I write, then?
If not can I read after I wrote?
Can I use it with a struct smaller than the allocated memory?

Reading from y[0] violates the strict aliasing rule. You use an lvalue of type long long to read objects of effective type long.
Assuming you omit that line; the next troublesome part is memset(x,0,16);. This answer argues that memset does not update the effective type. The standard is not clear.
Assuming that memset leaves the effective type unchanged; the next issue is the read of bar->a.
The C Standard is unclear on this too. Some people say that bar->a implies (*bar).a and this is a strict aliasing violation because we did not write a bar object to the location first.
Others (including me) say that it is fine: the only lvalue used for access is bar->a; that is an lvalue of type long long, and it accesses an object of effective type long long (the one written by y[0] = 2;).
There is a C2X working group that is working on improving the specification of strict aliasing to clarify these issues.

Can I recast the pointer to other datatypes, as long as the size fits?
You can recast1 to any data type that is at most as large as the memory you allocated. You must write a value however to change the effective type of the allcoated object according to 6.5p6
Can I read before I write, then?
If not can I read after I wrote?
No. Except when otherwise specified (calloc is the otherwise)2, the value in the memory is indeterminate. It may contain trap values. A cast in order to reinterpret a value as another type is UB, and a violation of strict aliasing (6.5p7)
Can I use it with a struct smaller than the allocated memory?
Yes, but that's a waste.
1 You'll need to cast to void* first. Otherwise you'd get a rightful complaint from the compiler about incompatible pointer types.
2 Even then some types may trap on a completely 0 bit pattern, so it depends.

Most compilers offer a mode where reads and writes of pointers will act upon the underlying storage, in the sequence they are performed, regardless of the data types involved. The Standard does not require compilers to offer such a mode, but as far as I can tell all quality compilers do so.
According to their published rationale, the authors of the Standard added aliasing restrictions to the language with the stated purpose of avoiding compilers to make pessimistic aliasing assumptions when given code like:
float f;
float test(int *p)
{
f=1.0f;
*p = 2;
return f;
}
Note that in the example given in the rationale [very much like the above], even if it were legal to modify the storage used by f via pointer p, a reasonable person looking at the code would have no reason to think it likely that such a thing would ever happen. On the other hand, many compiler writers recognized that if given something like:
float f;
float test(float *p)
{
f=1.0f;
*(int*)p = 2;
return f;
}
one would have to be deliberately obtuse to think that the code would be unlikely to modify the storage used by a float, and there was consequently no reason why a quality compiler should not regard the write to *(int*)p as a potential write to a float.
Unfortunately, in the intervening years, compiler writers have become increasingly aggressive with type-based aliasing "optimizations", sometimes in ways that go clearly and undeniably beyond what the Standard would allow. Unless a program will never need to access any storage as different types at different times, I'd suggest using -fno-strict-aliasing option on compilers that support it. Otherwise one may have code that complies with the Standard and works today, but fails in a future version of the compiler which has become even more aggressive with its "optimizations".
PS--Disabling type-based aliasing may impact the performance of code in some situations, but proper use of restrict-qualified variables and parameters should avoid the costs of pessimistic aliasing assumptions. With a little care, use of those qualifiers will enable the same optimizations as aggressive aliasing could have done, but much more safely.

Strict aliasing and flexible array member

I thought I knew C pretty well, but I'm confused by the following code:
typedef struct {
int type;
} cmd_t;
typedef struct {
int size;
char data[];
} pkt_t;
int func(pkt_t *b)
{
int *typep;
char *ptr;
/* #1: Generates warning */
typep = &((cmd_t*)(&(b->data[0])))->type;
/* #2: Doesn't generate warning */
ptr = &b->data[0];
typep = &((cmd_t*)ptr)->type;
return *typep;
}
When I compile with GCC, I get the "dereferencing type-punned pointer will break strict-aliasing rules" warning.
Why am I getting this warning at all? I'm dereferencing at char array. Casting a char * to anything is legal. Is this one of those cases where an array is not exactly the same as a pointer?
Why aren't both assignments generating the warning? The 2nd assignment is the equivalent of the first, isn't it?

When strict aliasing is turned on, the compiler is allowed to assume that two pointers of different type (char* vs cmt_t* in this instance) will not point to the same memory location. This allows for a greater range of optimizations which you would otherwise not want to be applied if they do indeed point to the same memory location. Various examples/horror-stories can be found in this question.
This is why, under strict-aliasing, you have to be careful how you do type punning. I believe that the standard doesn't allow for any type-puning what-so-ever (don't quote me on that) but most compilers have exemption for unions (my google-fu is failing in turning up the relevant manual pages):
union float_to_int {
double d;
uint64_t i;
};
union float_to_int ftoi;
ftoi.d = 1.0;
... = ftoi.i;
Unfortunately, this doesn't quite work for your situation as you would have to memcpy the content of the array into the union which is less then ideal. A simpler approach would be to simply to turn off strict-aliasing via the -fno-strict-aliasing switch. This will ensure that your code is correct and it's unlikely to have a significant performance impact (do measure if performance matters).
As for why the warning doesn't show up when the line is broken up, I don't know. Chances are that the modifications to the source code manages to confuse the compiler's static analysis pass enough that it doesn't see the type-punning. Note that the static analysis pass responsible for detecting type-punning is unrelated and doesn't talk to the various optimization passes that assume strict-aliasing. You can think of any static analysis done by compilers (unless otherwise specified) as a best-effort type of thing. In other words, the absence of warning doesn't mean that there are no errors which means that simply breaking up the line doesn't magically make your type-punning safe.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

MISRA compliant run-time detection of endianness - c

In a MISRA context, I suppose this header and this function might not be available, but: #include <arpa/inet.h> static endi get_endianness(void) { return htons(0x0001u) == 0x0001u ? endi_big : endi_little; }

Related

C: Portable way to define Array with 64-bit aligned starting address?

How to resolve MISRA C:2012 Rule 11.6?

Strict aliasing rule and strlen implementation of glibc

Can I do what I want with allocated memory

Strict aliasing and flexible array member

Categories

Resources