Is using flexible array members in C bad practice? - c

I recently read that using flexible array members in C was poor software engineering practice. However, that statement was not backed by any argument. Is this an accepted fact?
(Flexible array members are a C feature introduced in C99 whereby one can declare the last element to be an array of unspecified size. For example: )
struct header {
size_t len;
unsigned char data[];
};

It is an accepted "fact" that using goto is poor software engineering practice. That doesn't make it true. There are times when goto is useful, particularly when handling cleanup and when porting from assembler.
Flexible array members strike me as having one main use, off the top of my head, which is mapping legacy data formats like window template formats on RiscOS. They would have been supremely useful for this about 15 years ago, and I'm sure there are still people out there dealing with such things who would find them useful.
If using flexible array members is bad practice, then I suggest that we all go tell the authors of the C99 spec this. I suspect they might have a different answer.

No, using flexible array members in C is not bad practice.
This language feature was first standardized in ISO C99, 6.7.2.1 (16). In the following revision, ISO C11, it is specified in Section 6.7.2.1 (18).
You can use them like this:
struct Header {
size_t d;
long v[];
};
typedef struct Header Header;
size_t n = 123; // can dynamically change during program execution
// ...
Header *h = malloc(sizeof(Header) + sizeof(long[n]));
h->n = n;
Alternatively, you can allocate like this:
Header *h = malloc(sizeof *h + n * sizeof h->v[0]);
Note that sizeof(Header) includes eventual padding bytes, thus, the following allocation is incorrect and may yield a buffer overflow:
Header *h = malloc(sizeof(size_t) + sizeof(long[n])); // invalid!
A struct with a flexible array members reduces the number of allocations for it by 1/2, i.e. instead of 2 allocations for one struct object you need just 1. Meaning less effort and less memory occupied by memory allocator bookkeeping overhead. Furthermore, you save the storage for one additional pointer. Thus, if you have to allocate a large number of such struct instances you measurably improve the runtime and memory usage of your program (by a constant factor).
In contrast to that, using non-standardized constructs for flexible array members that yield undefined behavior (e.g. as in long v[0]; or long v[1];) obviously is bad practice. Thus, as any undefined-behaviour this should be avoided.
Since ISO C99 was released in 1999, more than 20 years ago, striving for ISO C89 compatibility is a weak argument.

PLEASE READ CAREFULLY THE COMMENTS BELOW THIS ANSWER
As C Standardization move forward there is no reason to use [1] anymore.
The reason I would give for not doing it is that it's not worth it to tie your code to C99 just to use this feature.
The point is that you can always use the following idiom:
struct header {
size_t len;
unsigned char data[1];
};
That is fully portable. Then you can take the 1 into account when allocating the memory for n elements in the array data :
ptr = malloc(sizeof(struct header) + (n-1));
If you already have C99 as requirement to build your code for any other reason or you are target a specific compiler, I see no harm.

You meant...
struct header
{
size_t len;
unsigned char data[];
};
In C, that's a common idiom. I think many compilers also accept:
unsigned char data[0];
Yes, it's dangerous, but then again, it's really no more dangerous than normal C arrays - i.e., VERY dangerous ;-) . Use it with care and only in circumstances where you truly need an array of unknown size. Make sure you malloc and free the memory correctly, using something like:-
foo = malloc(sizeof(header) + N * sizeof(data[0]));
foo->len = N;
An alternative is to make data just be a pointer to the elements. You can then realloc() data to the correct size as required.
struct header
{
size_t len;
unsigned char *data;
};
Of course, if you were asking about C++, either of these would be bad practice. Then you'd typically use STL vectors instead.

I've seen something like this:
from C interface and implementation.
struct header {
size_t len;
unsigned char *data;
};
struct header *p;
p = malloc(sizeof(*p) + len + 1 );
p->data = (unsigned char*) (p + 1 ); // memory after p is mine!
Note: data need not be last member.

As a side note, for C89 compatibility, such structure should be allocated like :
struct header *my_header
= malloc(offsetof(struct header, data) + n * sizeof my_header->data);
Or with macros :
#define FLEXIBLE_SIZE SIZE_MAX /* or whatever maximum length for an array */
#define SIZEOF_FLEXIBLE(type, member, length) \
( offsetof(type, member) + (length) * sizeof ((type *)0)->member[0] )
struct header {
size_t len;
unsigned char data[FLEXIBLE_SIZE];
};
...
size_t n = 123;
struct header *my_header = malloc(SIZEOF_FLEXIBLE(struct header, data, n));
Setting FLEXIBLE_SIZE to SIZE_MAX almost ensures this will fail :
struct header *my_header = malloc(sizeof *my_header);

There are some downsides related to how structs are sometimes used, and it can be dangerous if you don't think through the implications.
For your example, if you start a function:
void test(void) {
struct header;
char *p = &header.data[0];
...
}
Then the results are undefined (since no storage was ever allocated for data). This is something that you will normally be aware of, but there are cases where C programmers are likely used to being able to use value semantics for structs, which breaks down in various other ways.
For instance, if I define:
struct header2 {
int len;
char data[MAXLEN]; /* MAXLEN some appropriately large number */
}
Then I can copy two instances simply by assignment, i.e.:
struct header2 inst1 = inst2;
Or if they are defined as pointers:
struct header2 *inst1 = *inst2;
This however won't work for flexible array members, since their content is not copied over. What you want is to dynamically malloc the size of the struct and copy over the array with memcpy or equivalent.
struct header3 {
int len;
char data[]; /* flexible array member */
}
Likewise, writing a function that accepts a struct header3 will not work, since arguments in function calls are, again, copied by value, and thus what you will get is likely only the first element of your flexible array member.
void not_good ( struct header3 ) ;
This does not make it a bad idea to use, but you do have to keep in mind to always dynamically allocate these structures and only pass them around as pointers.
void good ( struct header3 * ) ;

Related

Can I "over-extend" an array by allocating more space to the enclosing struct?

Frankly, is such a code valid or does it produce UB?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct __attribute__((__packed__)) weird_struct
{
int some;
unsigned char value[1];
};
int main(void)
{
unsigned char text[] = "Allie has a cat";
struct weird_struct *ws =
malloc(sizeof(struct weird_struct) + sizeof(text) - 1);
ws->some = 5;
strcpy(ws->value, text);
printf("some = %d, value = %s\n", ws->some, ws->value);
free(ws);
return 0;
}
http://ideone.com/lpByQD
I’d never think it is valid to something like this, but it would seem that SystemV message queues do exactly that: see the man page.
So, if SysV msg queues can do that, perhaps I can do this too? I think I’d find this useful to send data over the network (hence the __attribute__((__packed__))).
Or, perhaps this is a specific guarantee of SysV msg queues and I shouldn’t do something like that elsewhere? Or, perhaps this technique can be employed, only I do it wrongly? I figured out I’d better ask.
This - 1 in malloc(sizeof(struct weird_struct) + sizeof(text) - 1) is because I take into account that one byte is allocated anyway thanks to unsigned char value[1] so I can subtract it from sizeof(text).
The standard C way (since C99) to do this would be using flexible array member. The last member of the structure needs to be incomplete array type and you can allocate required amount of memory at runtime.
Something like
struct __attribute__((__packed__)) weird_struct
{
int some;
unsigned char value [ ]; //nothing, no 0, no 1, no nothing...
};
and later
struct weird_struct *ws =
malloc(sizeof(struct weird_struct) + strlen("this to be copied") + 1);
or
struct weird_struct *ws =
malloc(sizeof(struct weird_struct) + sizeof("this to be copied"));
will do the job.
Related, quoting the C11 standard, chapter §6.7.2.1
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
Related to the one-element array usage, from online gcc manual page for zero-length array support option
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
In ISO C90, you would have to give contents a length of 1, which means either you waste space or complicate the argument to malloc.
which also answers the -1 part in the malloc() argument, as sizeof(char) is guaranteed to be 1 in C.
The Standard allows implementations to act in any way they see fit if code accesses an array object beyond its stated bounds, even if the code owns the storage that would be accessed thereby. So far as I can tell, this rule is intended to allow for a compiler given something like:
struct s1 { char arr[4]; char y; } *p;
int x;
...
p->y = 1;
p->arr[x] = 2;
return p->y;
to treat it as equivalent to:
struct s1 { char arr[4]; char y; } *p;
int x;
...
p->arr[x] = 2;
p->y = 1;
return 1;
avoiding an extra load step, without having to pessimistically allow for the possibility that x might equal 4. Quality compilers should be able to recognize certain constructs which involve accessing arrays beyond their stated bounds (e.g. those involving a pointer to a structure with a single-element array as its last element) and handle them sensibly, but nothing in the Standard would require that they do so, and some compiler writers take the attitude that permission for compilers to behave in nonsensical fashion should be interpreted as an invitation to do so. I think that behavior would be defined, even for the x==4 case (meaning the compiler would have to allow for the possibility of it modifying y), if the array write were handled via something like: (char*)(struct s1*)(p->arr)[x] = 2; but the Standard is not really clear on whether the cast to struct s1* is necessary.

Is it valid to call malloc with a pointer of the type of the first member?

I am building a hash library, this library works with different structs and all those structs haves an unsigned type as first member, an example:
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
struct data {
unsigned hash;
void (*func)(struct data *);
};
struct another_data {unsigned hash; int value;};
static void *hash_insert(const char *text, size_t size)
{
unsigned *hash;
hash = malloc(size);
// *hash = hash(text);
*hash = (unsigned)strlen(text);
return hash;
}
static void func(struct data *data)
{
printf("%u\n", data->hash);
}
int main(void)
{
struct data *data;
data = hash_insert("Some text", sizeof *data);
data->func = func;
data->func(data);
free(data);
return 0;
}
Since the first member of the struct and the struct itself haves the same alignment requirements, is it valid to call malloc with a pointer of the type of the first member in order to reserve space for the entire struct?
unsigned *hash = malloc(size); /* where size is the size of the struct */
EDIT:
In this related question provided by #MohitJain:
struct Msg
{
unsigned int a;
unsigned int b;
};
...
uint32_t* buff = malloc(sizeof(Msg));
// Alias that buffer through message
Msg* msg = (Msg*)(buff);
The strict aliasing rule makes this setup illegal
But in my case I am returning a void * from the function, I can use this (returned) pointer inside main without alignment issues, is this assumption correct?
The rules about effective type and pointer aliasing say that it is fine to convert pointers between a struct (or any other "aggregate") and a pointer of the same type as the first appearing member in the struct. And another rule says that structs are not allowed to have padding in the very beginning.
So it is fine as far as the C standard goes... which doesn't really say much of the quality of the program.
The code does not make much sense. You clearly want to use the whole struct, so why not return a pointer of the struct type? Don't complicate things for the sake of it. You should always avoid using void* when there is no need for it, to increase type safety.
Overall, all these things would sort themselves out in a multi-file project with proper program design. If you had a separate file with the hash type and all functions using that type, there would be no doubt of how to write the program.
malloc is guaranteed to return memory aligned for any type.
Therefore it will work regardless of the alignment requirement of the different subtypes.
Since structure member are allocated in the order they are declared, using
unsigned *hash = malloc(size); /* where size is the size of the struct */
can work if your pourpose is just using the hash data.
It fails if you want to apply pointer aritmetic on it, so in this case using
hash++
is an undefined behavior.
Yes it is perfectly fine for first member type to point to the memory returned by malloc (Memory is aligned for any data type). What you do with this memory later may cause issues if you are not careful.
[Extracts from Joachim Pileborg's answer # Maintaining an array of pointers in C that points to two related types with minor changes]
Having a structure inside other structures or a common data type in other structures is a common way of emulating inheritance in C. This common member should contain the minimal set of data common to all structures in the "inheritance" hierarchy, and it must always be the first member in the inheriting structures.
Possible issue with this scheme is:
This "inheritance" scheme will work on all modern PC-like systems and their compilers, and have done so for a long time, but there's no guarantee that it will work on all systems and all compilers (if you're planning on porting the code to some rare system with weird hardware and compiler you might want to look out, but the cases where the "inheritance" scheme will not work is very small, and most people will never come in contact with such a system in their entire lifetime.) But once you point the same memory with struct data *, you may fall victime of strict aliasing rule. So you need to be careful there. Further readL What is the strict aliasing rule?
unsigned *hash = malloc(size);
This will create allocate an array of unsigned integers. Total number of integers allocated will be size/sizeof(int). hash here is pointer to int.
Since the first member of the struct and the struct itself haves the same alignment requirements, is it valid to call malloc with a pointer of the type of the first member in order to reserve space for the entire struct?
The point is, that hash here is a separate variable which has nothing to do with the hash inside the structure. (It is in a separate namespace, if you want to look it up)
You can then cast hash to the struct variable.
struct data *data;
unsigned *hash = malloc(size * sizeof(struct));
data = (struct data *) hash;
But, what is the point. You can just as well remove the unnecessary unsigned hash pointer and go with the traditional.
struct data *data;
data = malloc(size * sizeof(struct));

Array of size 0 at the end of struct [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.
Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.
Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).
The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

Why do we use zero length array instead of pointers?

It's said that zero length array is for variable length structure, which I can understand. But what puzzle me is why we don't simply use a pointer, we can dereference and allocate a different size structure in the same way.
EDIT - Added example from comments
Assuming:
struct p
{
char ch;
int *arr;
};
We can use this:
struct p *p = malloc(sizeof(*p) + (sizeof(int) * n));
p->arr = (struct p*)(p + 1);
To get a contiguous chunk of memory. However, I seemed to forget the space p->arr occupies and it seems to be a disparate thing from the zero size array method.
If you use a pointer, the structure would no longer be of variable length: it will have fixed length, but its data will be stored in a different place.
The idea behind zero-length arrays* is to store the data of the array "in line" with the rest of the data in the structure, so that the array's data follows the structure's data in memory. Pointer to a separately allocated region of memory does not let you do that.
* Such arrays are also known as flexible arrays; in C99 you declare them as element_type flexArray[] instead of element_type flexArray[0], i.e. you drop zero.
The pointer isn't really needed, so it costs space for no benefit. Also, it might imply another level of indirection, which also isn't really needed.
Compare these example declarations, for a dynamic integer array:
typedef struct {
size_t length;
int data[0];
} IntArray1;
and:
typedef struct {
size_t length;
int *data;
} IntArray2;
Basically, the pointer expresses "the first element of the array is at this address, which can be anything" which is more generic than is typically needed. The desired model is "the first element of the array is right here, but I don't know how large the array is".
Of course, the second form makes it possible to grow the array without risking that the "base" address (the address of the IntArray2 structure itself) changes, which can be really neat. You can't do that with IntArray1, since you need to allocate the base structure and the integer data elements together. Trade-offs, trade-offs ...
These are various forms of the so-called "struct hack", discussed in question 2.6 of the comp.lang.c FAQ.
Defining an array of size 0 is actually illegal in C, and has been at least since the 1989 ANSI standard. Some compilers permit it as an extension, but relying on that leads to non-portable code.
A more portable way to implement this is to use an array of length 1, for example:
struct foo {
size_t len;
char str[1];
};
You could allocate more than sizeof (struct foo) bytes, using len to keep track of the allocated size, and then access str[N] to get the Nth element of the array. Since C compilers typically don't do array bounds checking, this would generally "work". But, strictly speaking, the behavior is undefined.
The 1999 ISO standard added a feature called "flexible array members", intended to replace this usage:
struct foo {
size_t len;
char str[];
};
You can deal with these in the same way as the older struct hack, but the behavior is well defined. But you have to do all the bookkeeping yourself; sizeof (struct foo) still doesn't include the size of the array, for example.
You can, of course, use a pointer instead:
struct bar {
size_t len;
char *ptr;
};
And this is a perfectly good approach, but it has different semantics. The main advantage of the "struct hack", or of flexible array members, is that the array is allocated contiguously with the rest of the structure, and you can copy the array along with the structure using memcpy (as long as the target has been properly allocated). With a pointer, the array is allocated separately -- which may or may not be exactly what you want.
This is because with a pointer you need a separate allocation and assignment.
struct WithPointer
{
int someOtherField;
...
int* array;
};
struct WithArray
{
int someOtherField;
...
int array[1];
};
To get an 'object' of WithPointer you need to do:
struct WithPointer* withPointer = malloc(sizeof(struct WithPointer));
withPointer.array = malloc(ARRAY_SIZE * sizeof(int));
To get an 'object' of WithArray:
struct WithArray* withArray = malloc(sizeof(struct WithArray) +
(ARRAY_SIZE - 1) * sizeof(int));
That's it.
In some cases it's also very handy, or even necessary, to have the array in consecutive memory; for example in network protocol packets.

zero length arrays [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
Recently I came across a structure definition,
struct arr {
int cnt;
struct {
int size;
int *name;
} list[0];
};
and now I don't know the reason for list[0] being declared. What I am interested in is why is this used. Does it have any advantage? If yes, what is it?
The use is for dynamic-length arrays. You can allocate the memory using malloc(), and have the array reside at the end of the structure:
struct arr *my_arr = malloc(sizeof *my_arr + 17 * sizeof *my_arr->list);
my_arr->cnt = 17;
my_arr->list[0].size = 0;
my_arr->list[1].name = "foo";
Actually being able to use 0 for the length is (as pointed out in a comment) a GCC extension. In C99, you can leave out the size literal altogether for the same effect.
Before these things were implemented, you often saw this done with a length of 1, but that complicates the allocation a bit since you must compensate when computing the memory needed.
It is called "struct hack". You can search for it on SO or on the Net
http://www.google.com/search?q=struct+hack&sitesearch=stackoverflow.com/questions
Note that formally it is always illegal to declare arrays of size 0 in C. The code you provided formally is not even compilable. Most C compilers will accept 0-sized array declaration as an extension though, specifically because it is often used in "lazy" version of "struct hack" (it can rely on sizeof to determine how much memory to allocate, since 0-sized array supposedly does not affect the total size of the struct).
An arguably better implementation of struct hack uses an array of size 1
struct arr {
int cnt;
struct {
int size;
int *name;
} list[1];
};
It is "better" because it is formally compilable at least. In order to allocate memory for a struct with N elements in the list, standard offsetof macro is used
arr *a = malloc(offsetof(arr, list) + N * sizeof a->list);
In C99 version of the language specification the "struct hack" is supported through size-less array declaration (with empty []), since 0-sized array declarations are illegal in C99 as well.
Another advantage is if your structure describes on-disk/on-network data. If cnt is 0, the data size may only be the length of cnt.
I'm here just to confirm what I dreaded, that list[0] is not valid.

Resources