Change of flexible array into pointer - c

I am working to get rid of MISRA violations coming in my C code. It is violating rule 18.7.
struct abc {
struct header;
uint8_t data[]; /* Line 1 */
};
Here, Line 1 is causing the MISRA violations.
I tried to convert it into:
struct abc {
struct header;
uint8_t *data;
};
Can we do like the above or is it violating something ?

Your solution is semantically different and won't work even if it clears the violation.
The intent here is to create a structure that can act as a header for the contiguous data that follows it. So for example if you have:
struct Message
{
struct abc info ;
char data[128] ;
} message ;
Such that message.info.data and message.data refer to the same thing and casting a struct abc to a struct Message allows a function to be defined for passing any object with a struct abc header. Effectively supporting polymorphism in C.
Replacing it with:
struct abc
{
struct header;
uint8_t* data;
};
is semantically different because the data member does not refer to the data contiguous with header. The copy semantics also differ, and it is unlikely in the context of the code that uses the original structure that it will work as intended.
GCC supports the following syntax:
struct abc
{
struct header;
uint8_t data[0] ;
} ;
but it is likely that is not MISRA compliant either. A compliant solution is to have:
struct abc
{
struct header;
uint8_t data[1] ;
} ;
But that inserts an extra character and any code that uses this as a header may need to accommodate that when accessing the data through the data member.

All safety-related systems bans dynamic memory allocation, and therefore MISRA-C:2012 does so as well. This is the rationale for rule 18.7: flexible array members are closely associated with with dynamic allocation and therefore not allowed.
The reason why dynamic allocation is banned is that there can be no non-deterministic behavior in these kind of systems. In addition, it doesn't make any sense to use dynamic allocation in microcontroller/RTOS applications.
You can swap the flexible array member for a pointer if it makes sense to your application. But if it is some manner of protocol or data structure header, you probably want a fixed-size array instead. (And mind struct padding: storing data communication protocols in structs can be problematic because of alignment and endianess.)

Yes you can, as it makes the structure size deterministic and static, but it also forces you to allocate then release the needed space for data with malloc() and free(), or explicitly make it point to some already available space somewhere, each time you instanciate the structure.
What you probably want to do here is to specify a definite length to your array. If however this structure is meant to actually describe the header of a data block, you may use data[1] then let your index exceed this value to access the rest (ISO C forbids 0-length arrays, though).

Related

How to include a variable-sized array as stuct member in C?

I must say, I have quite a conundrum in a seemingly elementary problem. I have a structure, in which I would like to store an array as a field. I'd like to reuse this structure in different contexts, and sometimes I need a bigger array, sometimes a smaller one. C prohibits the use of variable-sized buffer. So the natural approach would be declaring a pointer to this array as struct member:
struct my {
struct other* array;
}
The problem with this approach however, is that I have to obey the rules of MISRA-C, which prohibits dynamic memory allocation. So then if I'd like to allocate memory and initialize the array, I'm forced to do:
var.array = malloc(n * sizeof(...));
which is forbidden by MISRA standards. How else can I do this?
Since you are following MISRA-C, I would guess that the software is somehow mission-critical, in which case all memory allocation must be deterministic. Heap allocation is banned by every safety standard out there, not just by MISRA-C but by the more general safety standards as well (IEC 61508, ISO 26262, DO-178 and so on).
In such systems, you must always design for the worst-case scenario, which will consume the most memory. You need to allocate exactly that much space, no more, no less. Everything else does not make sense in such a system.
Given those pre-requisites, you must allocate a static buffer of size LARGE_ENOUGH_FOR_WORST_CASE. Once you have realized this, you simply need to find a way to keep track of what kind of data you have stored in this buffer, by using an enum and maybe a "size used" counter.
Please note that not just malloc/calloc, but also VLAs and flexible array members are banned by MISRA-C:2012. And if you are using C90/MISRA-C:2004, there are no VLAs, nor are there any well-defined use of flexible array members - they invoked undefined behavior until C99.
Edit: This solution does not conform to MISRA-C rules.
You can kind of include VLAs in a struct definition, but only when it's inside a function. A way to get around this is to use a "flexible array member" at the end of your main struct, like so:
#include <stdio.h>
struct my {
int len;
int array[];
};
You can create functions that operate on this struct.
void print_my(struct my *my) {
int i;
for (i = 0; i < my->len; i++) {
printf("%d\n", my->array[i]);
}
}
Then, to create variable length versions of this struct, you can create a new type of struct in your function body, containing your my struct, but also defining a length for that buffer. This can be done with a varying size parameter. Then, for all the functions you call, you can just pass around a pointer to the contained struct my value, and they will work correctly.
void create_and_use_my(int nelements) {
int i;
// Declare the containing struct with variable number of elements.
struct {
struct my my;
int array[nelements];
} my_wrapper;
// Initialize the values in the struct.
my_wrapper.my.len = nelements;
for (i = 0; i < nelements; i++) {
my_wrapper.my.array[i] = i;
}
// Print the struct using the generic function above.
print_my(&my_wrapper.my);
}
You can call this function with any value of nelements and it will work fine. This requires C99, because it does use VLAs. Also, there are some GCC extensions that make this a bit easier.
Important: If you pass the struct my to another function, and not a pointer to it, I can pretty much guarantee you it will cause all sorts of errors, since it won't copy the variable length array with it.
Here's a thought that may be totally inappropriate for your situation, but given your constraints I'm not sure how else to deal with it.
Create a large static array and use this as your "heap":
static struct other heap[SOME_BIG_NUMBER];
You'll then "allocate" memory from this "heap" like so:
var.array = &heap[start_point];
You'll have to do some bookkeeping to keep track of what parts of your "heap" have been allocated. This assumes that you don't have any major constraints on the size of your executable.

struct xyz a[0]; What does this mean? [duplicate]

I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.

What are the real benefits of flexible array member?

After reading some posts related to flexible array member, I am still not fully understand why we need such a feature.
Possible Duplicate:
Flexible array members in C - bad?
Is this a Flexible Array Struct Members in C as well?
(Blame me if I didn't solve my problem from the possible duplicate questions above)
What is the real difference between the following two implementations:
struct h1 {
size_t len;
unsigned char *data;
};
struct h2 {
size_t len;
unsigned char data[];
};
I know the size of h2 is as if the flexible array member (data) were omitted, that is, sizeof(h2) == sizeof(size_t). And I also know that the flexible array member can only appear as the last element of a structure, so the original implementation can be more flexible in the position of data.
My real problem is that why C99 add this feature? Simply because sizeof(h2) doesn't contain the real size of data? I am sure that I must miss some more important points for this feature. Please point it out for me.
The two structs in your post don't have the same structure at all. h1 has a integer and a pointer to char. h2 has an integer, and an array of characters inline (number of elements determined at runtime, possibly none).
Said differently, in h2 the character data is inside the struct. In h1 it has to be somewhere outside.
This makes a lot of difference. For instance, if you use h1 you need to take care of allocating/freeing the payload (in addition to the struct itself). With h2, only one allocation/free is necessary, everything is packaged together.
One case where using h2 might make sense is if you're communicating with something that expects messages in the form of {length,data} pairs. You allocate an instance of h2 by requesting sizeof(h2)+how many payload chars you want, fill it up, and then you can transfer the whole thing in a single write (taking care about endianess and such of course). If you had used h1, you'd need two write calls (unless you want to send the memory address of the data, which usually doesn't make any sense).
So this feature exists because it's handy. And various (sometimes non-portable) tricks where used before that to simulate this feature. Adding it to the standard makes sense.
The main reason the Committee introduced flexible array members is to implement the famous struct hack. See the below quote from the C99 Rationale, especially the part I add the emphasis.
Rationale for International Standard — Programming Languages — C §6.7.2.1 Structure and union specifiers
There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
struct s
{
int n_items;
/* possibly other fields */
int items[1];
};
struct s *p;
size_t n, i;
/* code that sets n omitted */
p = malloc(sizeof(struct s) + (n - 1) * sizeof(int));
/* code to check for failure omitted */
p->n_items = n;
/* example usage */
for (i = 0; i < p->n_items; i++)
p->items[i] = i;
The validity of this construct has always been questionable. In the response to one Defect
Report, the Committee decided that it was undefined behavior because the array p->items
contains only one item, irrespective of whether the space exists. An alternative construct was suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call, this is used in the same way as the struct hack, but is now explicitly valid code.
There are a few restrictions on flexible array members that ensure that code using them makes sense. For example, there must be at least one other member, and the flexible array must occur last. Similarly, structures containing flexible arrays can't occur in other structures or in arrays. Finally, sizeof applied to the structure ignores the array but counts any padding before it. This makes the malloc call as simple as possible.
I don't know if this is considered as an important point, but GCC docs points this out:
GCC allows static initialization of flexible array members. This is equivalent to defining a new structure containing the original structure followed by an array of sufficient size to contain the data. E.g. in the following, f1 is constructed as if it were declared like f2.
struct f1 {
int x; int y[];
} f1 = { 1, { 2, 3, 4 } };
struct f2 {
struct f1 f1; int data[3];
} f2 = { { 1 }, { 2, 3, 4 } };
(taken from http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html)

What is there to be gained by deterministic field ordering in the memory layout?

Members of a structure are allocated within the structure in the order of their appearance in the declaration and have ascending addresses.
I am faced with the following dilemma: when I need to declare a structure, do I
(1) group the fields logically, or
(2) in decreasing size order, to save RAM and ROM size?
Here is an example, where the largest data member should be at the top, but also should be grouped with the logically-connected colour:
struct pixel{
int posX;
int posY;
tLargeType ColourSpaceSecretFormula;
char colourRGB[3];
}
The padding of a structure is non-deterministic (that is, is implementation-dependent), so we cannot reliably do pointer arithmetic on structure elements (and we shouldn't: imagine someone reordering the fields to his liking: BOOM, the whole code stops working).
-fpack-structs solves this in gcc, but bears other limitations, so let's leave compiler options out of the question.
On the other hand, code should be, above all, readable. Micro optimizations are to be avoided at all cost.
So, I wonder, why are structures' members ordered by the standard, making me worry about the micro-optimization of ordering struct member in a specific way?
The compiler is limited by several traditional and practical limitations.
The pointer to the struct after a cast (the standard calls it "suitably converted") will be equal to the pointer to the first element of the struct. This has often been used to implement overloading of messages in message passing. In that case a struct has the first element that describes what type and size the rest of the struct is.
The last element can be a dynamically resized array. Even before official language support this has been often used in practice. You allocate sizeof(struct) + length of extra data and can access the last element as a normal array with as many elements that you allocated.
Those two things force the compiler to have the first and last elements in the struct in the same order as they are declared.
Another practical requirement is that every compilation must order the struct members the same way. A smart compiler could make a decision that since it sees that some struct members are always accessed close to each other they could be reordered in a way that makes them end up in a cache line. This optimization is of course impossible in C because structs often define an API between different compilation units and we can't just reorder things differently on different compilations.
The best we could do given the limitations is to define some kind of packing order in the ABI to minimize alignment waste that doesn't touch the first or last element in the struct, but it would be complex, error prone and probably wouldn't buy much.
If you couldn't rely on the ordering, then it would be much harder to write low-level code which maps structures onto things like hardware registers, network packets, external file formats, pixel buffers, etc.
Also, some code use a trick where it assumes that the last member of the structure is the highest-addressed in memory to signify the start of a much larger data block (of unknown size at compile time).
Reordering fields of structures can sometime yield good gains in data size and often also in code size, especially in 64 bit memory model. Here an example to illustrate (assuming common alignment rules):
struct list {
int len;
char *string;
bool isUtf;
};
will take 12 bytes in 32 bit but 24 in 64 bit mode.
struct list {
char *string;
int len;
bool isUtf;
};
will take 12 bytes in 32 bit but only 16 in 64 bit mode.
If you have an array of these structures you gain 50% in the data but also in code size, as indexing on a power of 2 is simpler than on other sizes.
If your structure is a singleton or not frequent, there's not much point in reordering the fields. If it is used a lot, it's a point to look at.
As for the other point of your question. Why doesn't the compiler do this reordering of fields, it is because in that case, it would be difficult to implement unions of structures that use a common pattern. Like for example.
struct header {
enum type;
int len;
};
struct a {
enum type;
int len;
bool whatever1;
};
struct b {
enum type;
int len;
long whatever2;
long whatever4;
};
struct c {
enum type;
int len;
float fl;
};
union u {
struct h header;
struct a a;
struct b b;
struct c c;
};
If the compiler rearranged the fields, this construct would be much more inconvenient, as there would be no guarantee that the type and len fields were identical when accessing them via the different structs included in the union.
If I remember correctly the standard even mandates this behaviour.

Accessing array as a struct *

This is one of those I think this should work, but it's best to check questions. It compiles and works fine on my machine.
Is this guaranteed to do what I expect (i.e. allow me to access the first few elements of the array with a guarantee that the layout, alignment, padding etc of the struct is the same as the array)?
struct thingStruct
{
int a;
int b;
int c;
};
void f()
{
int thingsArray[5];
struct thingStruct *thingsStruct = (struct thingStruct *)&thingsArray[0];
thingsArray[0] = 100;
thingsArray[1] = 200;
thingsArray[2] = 300;
printf("%d", thingsStruct->a);
printf("%d", thingsStruct->b);
printf("%d", thingsStruct->c);
}
EDIT: Why on earth would I want to do something like this? I have an array which I'm mmapping to a file. I'm treating the first part of the array as a 'header', which stores various pieces of information about the array, and the rest of it I'm treating as a normal array. If I point the struct to the start of the array I can access the pieces of header data as struct members, which is more readable. All the members in the struct would be of the same type as the array.
While I have seen this done frequently, you cannot (meaning it is not legal, standard C) make assumptions about the binary layout of a structure, as it may have padding between fields.
This is explained in the comp.lang.c faq: http://c-faq.com/struct/padding.htmls
Although it's likely to work in most places, it's still a bit iffy. If you want to give symbolic names to parts of the header, why not just do:
enum { HEADER_A, HEADER_B, HEADER_C };
/* ... */.
printf("%d", thingsArray[HEADER_A]);
printf("%d", thingsArray[HEADER_B]);
printf("%d", thingsArray[HEADER_C]);
As Evan commented on the question, this will probably work in most cases (again, probably best if you use #pragma pack to ensure their is no padding) assuming all the types in your struct are the same type as your array. Given the rules of C, this is legal.
My question to you is "why?" This isn't a particularly safe thing to do. If a float gets thrown into the middle of the struct, this all falls apart. Why not just use the struct directly? This really ins't a technique that I'd recommend in most cases.
Another solution for representing a header and the rest of file data is using a structure like this:
struct header {
long headerData1;
int headerData2;
int headerData3;
int fileData[ 1 ]; // <- data begin here
};
Then you allocate the memory block with a file contents and cast it as struct header *myFileHeader (or map the memory block on a file) and access all your file data with
myFileHeader->fileData[ position ]
for arbitrary big position. The language imposes no restriction on the index value, so it's only your responsibility to keep your arbitrary big posistion within the actual size of the memory block you allocated (or the mapped file's size).
One more important note: apart from switching off the struct members padding, which has been already described by others, you should carefully choose data types for the header members, so that they fit the actual file data layout despite compiler you use (say, int won't change from 32 to 64 bits...)

Resources