Conforming variant of the old "struct hack" (?) - c

I believe I've found a way to achieve something like the well-known "struct hack" in portable C89. I'm curious if this really strictly conforms to C89.
The main idea is: I allocate memory large enough to hold an initial struct and the elements of the array. The exact size is (K + N) * sizeof(array_base_type), where K is chosen so that K * sizeof(array_base_type) >= sizeof(the_struct) and N is the number of array elements.
First, I dereference the pointer that malloc() returned to store the_struct, then I use pointer arithmetic to obtain a pointer to the beginning of the array following the struct.
One line of code is worth more than a thousand words, so here is a minimal implementation:
typedef struct Header {
size_t length;
/* other members follow */
} Header;
typedef struct Value {
int type;
union {
int intval;
double fltval;
} v;
} Value;
/* round up to nearest multiple of sizeof(Value) so that a Header struct fits in */
size_t n_hdr = (sizeof(Header) + sizeof(Value) - 1) / sizeof(Value);
size_t n_arr = 42; /* arbitrary array size here */
void *frame = malloc((n_hdr + n_arr) * sizeof(Value));
if (!frame)
return NULL;
Header *hdr = frame;
Value *stack_bottom = (Value *)frame + n_hdr;
My main concern is that the last two assignments (using frame as both a pointer to Header and a pointer to Value) may violate the strict aliasing rule. I do not, however, dereference hdr as a pointer to Value - it's only pointer arithmetic that is performed on frame in order to access the first element of the value array, so I don't effectively access the same object using pointers of different types.
So, is this approach any better than the classic struct hack (which has been officially deemed UB), or is it UB too?

The "obvious" (well... not exactly obvious, but it's what comes to my mind anyway :-) ) way to cause this to break is to use a vectorizing compiler that somehow decides it's OK to load, say, 64 Headers into a vector register from the 42-rounded-up-to-64+ area at hdr which comes from malloc which always allocates enough to vectorize. Storing the vector register back to memory might overwrite one of the Values.
I think this vectorizing compiler could point to the standard (well, if a compiler has fingers...) and claim conformance.
In practice, though, I'd expect this code to work. If you come across a vectorizing compiler, add even more space (do the rounding up with a machine-dependent macro that can insert a minimum) and charge on. :-)

Related

Why does GCC allow zero length array only as last member?

According to this,
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
It is said that the benefit is
They are very useful as the last element of a structure that is really
a header for a variable-length object
What does it mean?
The zero-length array is a GCC extension (read as: not standard) which you should not use.
While recent versions of C allow for someting similar (flexible array member with empty brackets), C++ knows no such thing. As people often mix C and C++, this is a possible source of confusion.
Instead, an array of length 1 should be used, which is standards-compliant under both C and C++, and which just works with every compiler.
What is this useful for at all?
Sometimes you need to access "invalid" out-of-bounds data knowing that it is valid in reality. In the strictest sense, this is undefined behavior (since you are accessing out-of-bounds values which are indeterminate, and using indeterminate values is UB), but that is only for what the compiler knows, not for what it fact, so it nevertheless "works fine".
For example, you might receive framed data on the network consisting of a tag word, a length, and an amount of data corresponding to the length given. Or an operating system function might return a variable amount of results to you (a couple of Win32 API functions work that way, for example).
In either case, you have a unknown (unknown at compile time) number of elements at the end of this structure, so it is not possible to define a single legitimate structure to hold everything.
That is what flexible array members are for. And with this, it is explained why they must be the last member as well. It doesn't make sense for something that could have "any size" to be anywhere but at the end -- it's impossible for the compiler to lay out any members after it, not knowing its size.
(In case you wonder how the compiler can ever free the storage not knowing the objects's size... it cannot! There normally exists an explicit function for freeing such an object as part of the API, which takes care of this exact problem.)
It's probably best to demonstrate with a small example:
#include <stdio.h>
#include <stdlib.h>
#define BLOB_TYPE_FOO 0xBEEF
struct blob {
/* Part of your object header... perhaps describing the type of blob. */
int type;
/* This is actually the length of the "data" field below */
unsigned length;
/* The data */
unsigned char data[];
};
struct blob *
create_blob(int type, size_t size)
{
/* Allocate enough space for the "header" and "size" bytes of data. */
struct blob *x = calloc(1, sizeof(struct blob) + size);
x->type = type;
x->length = size;
return x;
}
int
main(void)
{
/* Note that sizeof(struct blob) doesn't include the data field. */
printf("sizeof(struct blob): %zu\n", sizeof(struct blob));
struct blob *x = create_blob(BLOB_TYPE_FOO, 1000);
/*
You can manipulate data here, but be careful not to exceed the
allocated size.
*/
size_t i;
for (i = 0; i < 1000; i++)
{
x->data[i] = 'A' + (i % 26);
}
/*
Since data was allocated with the rest of the header, everything is
freed.
*/
free(x);
return 0;
}
The nice part about this setup is that sizeof(struct blob) represents the size of the "object header" (on my machine, that's 8 bytes), and that since you allocate the whole object together, a single free() is all that is needed to release the memory.
Like others have stated here, this is a non-standard extension and you should really consider using it with care. Damon's answer is the better way to go, though the sizeof() operation is not quite the right size (it's a bit too large to represent the size of the actual header). It's not too hard to workaround that problem though.
You cannnot have the array of 0 length because if you try to make a zero length array then it would mean that you are trying to create a pointer to nothing which is not correct. The C standard says:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.

Typecasting of pointers in C

I know a pointer to one type may be converted to a pointer of another type. I have three questions:
What should kept in mind while typecasting pointers?
What are the exceptions/error may come in resulting pointer?
What are best practices to avoid exceptions/errors?
A program well written usually does not use much pointer typecasting. There could be a need to use ptr typecast for malloc for instance (declared (void *)malloc(...)), but it is not even necessary in C (while a few compilers may complain).
int *p = malloc(sizeof(int)); // no need of (int *)malloc(...)
However in system applications, sometimes you want to use a trick to perform binary or specific operation - and C, a language close to the machine structure, is convenient for that. For instance say you want to analyze the binary structure of a double (that follows thee IEEE 754 implementation), and working with binary elements is simpler, you may declare
typedef unsigned char byte;
double d = 0.9;
byte *p = (byte *)&d;
int i;
for (i=0 ; i<sizeof(double) ; i++) { ... work with b ... }
You may also use an union, this is an exemple.
A more complex utilisation could be the simulation of the C++ polymorphism, that requires to store the "classes" (structures) hierarchy somewhere to remember what is what, and perform pointer typecasting to have, for instance, a parent "class" pointer variable to point at some time to a derived class (see the C++ link also)
CRectangle rect;
CPolygon *p = (CPolygon *)&rect;
p->whatami = POLY_RECTANGLE; // a way to simulate polymorphism ...
process_poly ( p );
But in this case, maybe it's better to directly use C++!
Pointer typecast is to be used carefully for well determined situations that are part of the program analysis - before development starts.
Pointer typecast potential dangers
use them when it's not necessary - that is error prone and complexifies the program
pointing to an object of different size that may lead to an access overflow, wrong result...
pointer to two different structures like s1 *p = (s1 *)&s2; : relying on their size and alignment may lead to an error
(But to be fair, a skilled C programmer wouldn't commit the above mistakes...)
Best practice
use them only if you do need them, and comment the part well that explains why it is necessary
know what you are doing - again a skilled programmer may use tons of pointer typecasts without fail, i.e. don't try and see, it may work on such system / version / OS, and may not work on another one
In plain C you can cast any pointer type to any other pointer type. If you cast a pointer to or from an uncompatible type, and incorrectly write the memory, you may get a segmentation fault or unexpected results from your application.
Here is a sample code of casting structure pointers:
struct Entity {
int type;
}
struct DetailedEntity1 {
int type;
short val1;
}
struct DetailedEntity2 {
int type;
long val;
long val2;
}
// random code:
struct Entity* ent = (struct Entity*)ptr;
//bad:
struct DetailedEntity1* ent1 = (struct DetailedEntity1*)ent;
int a = ent->val; // may be an error here, invalid read
ent->val = 117; // possible invali write
//OK:
if (ent->type == DETAILED_ENTITY_1) {
((struct DetailedEntity1*)ent)->val1;
} else if (ent->type == DETAILED_ENTITY_2) {
((struct DetailedEntity2*)ent)->val2;
}
As for function pointers - you should always use functions which exactly fit the declaration. Otherwise you may get unexpected results or segfaults.
When casting from pointer to pointer (structure or not) you must ensure that the memory is aligned in the exact same way. When casting entire structures the best way to ensure it is to use the same order of the same variables at the start, and differentiating structures only after the "common header". Also remember, that memory alignment may differ from machine to machine, so you can't just send a struct pointer as a byte array and receive it as byte array. You may experience unexpected behaviour or even segfaults.
When casting smaller to larger variable pointers, you must be very careful. Consider this code:
char* ptr = malloc (16);
ptr++;
uint64_t* uintPtr = ptr; // may cause an error, memory is not properly aligned
And also, there is the strict aliasing rule that you should follow.
You probably need a look at ... the C-faq maintained by Steve Summit (which used to be posted in the newsgroups, which means it was read and updated by a lot of the best programmers at the time, sometimes the conceptors of the langage itself).
There is an abridged version too, which is maybe more palatable and still very, very, very, very useful. Reading the whole abridged is, I believe, mandatory if you use C.

Are there any guarantees about C struct order?

I've used structs extensively and I've seen some interesting things, especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?
Also note that sizes aren't guaranteed because of alignment, whats the alginment value based on, the architecture/register size?
We align data/code for faster execution can we tell compiler not to do this? so maybe we can guarantee certain things about structs, like their size?
When doing pointer arithmetic on struct members in order to locate member offset, I take it you do - if little endian + for big endian, or does it just depend on the compiler?
what does malloc(0) really allocate?
The following code is for educational/discovery purposes, its not meant to be of production quality.
#include <stdlib.h>
#include <stdio.h>
int main()
{
printf("sizeof(struct {}) == %lu;\n", sizeof(struct {}));
printf("sizeof(struct {int a}) == %lu;\n", sizeof(struct {int a;}));
printf("sizeof(struct {int a; double b;}) == %lu;\n", sizeof(struct {int a; double b;}));
printf("sizeof(struct {char c; double a; double b;}) == %lu;\n", sizeof(struct {char c; double a; double b;}));
printf("malloc(0)) returns %p\n", malloc(0));
printf("malloc(sizeof(struct {})) returns %p\n", malloc(sizeof(struct {})));
struct {int a; double b;} *test = malloc(sizeof(struct {int a; double b;}));
test->a = 10;
test->b = 12.2;
printf("test->a == %i, *test == %i \n", test->a, *(int *)test);
printf("test->b == %f, offset of b is %i, *(test - offset_of_b) == %f\n",
test->b, (int)((void *)test - (void *)&test->b),
*(double *)((void *)test - ((void *)test - (void *)&test->b))); // find the offset of b, add it to the base,$
free(test);
return 0;
}
calling gcc test.c followed by ./a.out
I get this:
sizeof(struct {}) == 0;
sizeof(struct {int a}) == 4;
sizeof(struct {int a; double b;}) == 16;
sizeof(struct {char c; double a; double b;}) == 24;
malloc(0)) returns 0x100100080
malloc(sizeof(struct {})) returns 0x100100090
test->a == 10, *test == 10
test->b == 12.200000, offset of b is -8, *(test - offset_of_b) == 12.200000
Update
this is my machine:
gcc --version
i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
uname -a
Darwin MacBookPro 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun 7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386
From 6.2.5/20:
A structure type describes a sequentially allocated nonempty set of member objects
(and, in certain circumstances, an incomplete array), each of which has an optionally
specified name and possibly distinct type.
To answer:
especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?
see 6.7.2.1/15:
15 Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.1
There may however be padding bytes at the end of the structure as also in-between members.
In C, malloc( 0 ) is implementation defined. (As a side note, this is one of those little things where C and C++ differ.)
[1] Emphasis mine.
I've used structs extensively and I've seen some interesting things, especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?
Yes, *value is safe; it yields a copy of the structure that value points at. But it is almost guaranteed to have a different type from *value->first_value, so the result of *value will almost always be different from *value->first_value.
Counter-example:
struct something { struct something *first_value; ... };
struct something data = { ... };
struct something *value = &data;
value->first_value = value;
Under this rather limited set of circumstances, you would get the same result from *value and *value->first_value. Under that scheme, the types would be the same (even if the values are not). In the general case, the type of *value and *value->first_value are of different types.
Also note that sizes aren't guaranteed because of alignment, but is alignment always on register size?
Since 'register size' is not a defined C concept, it isn't clear what you're asking. In the absence of pragmas (#pragma pack or similar), the elements of a structure will be aligned for optimal performance when the value is read (or written).
We align data/code for faster execution; can we tell compiler not to do this? So maybe we can guarantee certain things about structs, like their size?
The compiler is in charge of the size and layout of struct types. You can influence by careful design and perhaps by #pragma pack or similar directives.
These questions normally arise when people are concerned about serializing data (or, rather, trying to avoid having to serialize data by processing structure elements one at a time). Generally, I think you're better off writing a function to do the serialization, building it up from component pieces.
When doing pointer arithmetic on struct members in order to locate member offset, I take it you do subtraction if little endian, addition for big endian, or does it just depend on the compiler?
You're probably best off not doing pointer arithmetic on struct members. If you must, use the offsetof() macro from <stddef.h> to handle the offsets correctly (and that means you're not doing pointer arithmetic directly). The first structure element is always at the lowest address, regardless of big-endianness or little-endianness. Indeed, endianness has no bearing on the layout of different members within a structure; it only has an affect on the byte order of values within a (basic data type) member of a structure.
The C standard requires that the elements of a structure are laid out in the order that they are defined; the first element is at the lowest address, and the next at a higher address, and so on for each element. The compiler is not allowed to change the order. There can be no padding before the first element of the structure. There can be padding after any element of the structure as the compiler sees fit to ensure what it considers appropriate alignment. The size of a structure is such that you can allocate (N × size) bytes that are appropriately aligned (e.g. via malloc()) and treat the result as an array of the structure.
Calling malloc(0) will return a pointer that may be safely passed to free() at least once. If the same value is returned by multiple malloc(0) calls, it may be freed once for each such call. Obviously, if it returns NULL, that could be passed to free() an unlimited number of times without effect. Every call to malloc(0) which returns non-null should be balanced by a call to free() with the returned value.
If you have an inner structure it is guaranteed to start on the same address as the enclosing one if that is the first declaration of the enclosing structure.
So *value and value->first is accessing memory at the same address (but using different types) in the following
struct St {
long first;
} *value;
Also, the ordering between memebers of the structure is guaranteed to be the same as the declaration order
To adjust alignment, you can use compiler specific directives or use bitfields.
The alignment of structure memebers are usually based on what's best to access the individual members on the target platform
Also, for malloc, it is possible it keeps some bookkeeping near the returned address, so even for zero-size memory it can return a valid address (just don't try to access anything via the returned address)
It is important to learn about the way that size of struct works. for example:
struct foo{
int i;
char c;
}
struct bar{
int i;
int j;
}
struct baz{
int i;
char c;
int j;
}
sizeof(foo) = 8 bytes (32 bit arch)
sizeof(bar) = 8 bytes
sizeof(baz) = 12 bytes
What this means is that struct sizes and offsets have to follow two rules:
1- The struct must be a multiple of it's first element (Why foo is 8 not 5 bytes)
2- A struct element must start on a multiple of itself. (In baz, int j could not start on 6, so bytes 6, 7, and 8 are wasted padding

What portability issues are associated with byte-level access to pointers in C?

Purpose
I am writing a small library for a larger project which supplies malloc/realloc/free wrapper-functions as well as a function which can tell you whether or not its parameter (of type void *) corresponds to live (not yet freed) memory allocated and managed by the library's wrapper-functions. Let's refer to this function as isgood_memory.
Internally, the library maintains a hash-table to ensure that the search performed by isgood_memory is reasonably fast. The hash-table maintains pointer values (elements of type void *) to make the search possible. Clearly, values are added and removed from the hash-table to keep it up-to-date with what has been allocated and what has been freed, respectively.
The portability of the library is my biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more.
Question
Since portability is my biggest concern, I couldn't assume that sizeof(void *) == sizeof(X) for the hash-function. Therefore, I have resorted to treating the value byte-by-byte as if it were a string. To accomplish this, the hash function looks a little like:
static size_t hashit(void *ptrval)
{
size_t i = 0, h = 0;
union {
void *ptrval;
unsigned char string[sizeof(void *)];
} ptrstr;
ptrstr.ptrval = ptrval;
for (; i < sizeof(void *); ++i) {
size_t byte = ptrstr.string[i];
/* Crazy operations here... */
}
return (h);
}
What portability concerns do any of you have with this particular fragment? Will I encounter any funky alignment issues by accessing ptrval byte-by-byte?
You are allowed to access a data type as an array of unsigned char, as you do here. The major portability issue that I see could occur on platforms where the bit-pattern identifying a particular location is not unique - in that case, you might get pointers that compare equal hashing to different locations because the bit patterns were different.
Why could they be different? Well, for one thing, most C data types are allowed to contain padding bits that don't participate in the value. A platform where pointers contained such padding bits could have two pointers that differed only in the padding bits point to the same location. (For example, the OS might use some pointer bits to indicate capabilities of the pointer, not just physical address.) Another example is the far memory model from the early days of DOS, where far pointers consisted of segment:offset, and the adjacent segments overlapped, so that segment:offset could point to the same location as segment+1:offset-x.
All that said, on most platforms in common use today, the bit pattern pointing to a given location is indeed unique. So your code will be widely portable, even though it is unlikely to be strictly conforming.
Looks pretty clean. If you can rely on the <inttypes.h> header from C99 (it is often available elsewhere), then consider using uintptr_t - but if you want to hash the value byte-wise, you end up breaking things down to bytes and there is no real advantage to it.
Mostly correct. There's one potential problem, though. you assign
size_t byte = ptrstr.string[i];
*string is defined as char, not unsigned char. On the platform that has signed chars and unsigned size_t, it will give you result that you may or may not expect. Just change your char to unsigned char, that will be cleaner.
If you don't need the pointer values for some other reason beside keeping track of allocated memory, why not get rid of the hash table altogether and just store a magic number along with the memory allocated as in the example below. The magic number being present alongside the memory allocated indicates that it is still "alive". When freeing the memory you clear the stored magic number before freeing the memory.
#pragma pack(1)
struct sMemHdl
{
int magic;
byte firstByte;
};
#pragma pack()
#define MAGIC 0xDEADDEAD
#define MAGIC_SIZE sizeof(((struct sMemHdl *)0)->magic)
void *get_memory( size_t request )
{
struct sMemHdl *pMemHdl = (struct sMemHdl *)malloc(MAGIC_SIZE + request);
pMemHdl->magic = MAGIC;
return (void *)&pMemHdl->firstByte;
}
void free_memory ( void *mem )
{
if ( isgood_memory(mem) != 0 )
{
struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)mem - MAGIC_SIZE);
pMemHdl->magic = 0;
free(pMemHdl);
}
}
int isgood_memory ( void *Mem )
{
struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)Mem - MAGIC_SIZE);
if ( pMemHdl->magic == MAGIC )
{
return 1; /* mem is good */
}
else
{
return 0; /* mem already freed */
}
}
This may be a bit hackish, but I guess I'm in a hackish mood...
Accessing variables such integers or pointers as chars or unsigned chars in not a problem from a portability view. But the reverse is not true, because it is hardware dependent.
I have one question, why are you hashing a pointer as a string instead of using the pointer itself as a hash value ( using uintptr_t) ?

Is using flexible array members in C bad practice?

I recently read that using flexible array members in C was poor software engineering practice. However, that statement was not backed by any argument. Is this an accepted fact?
(Flexible array members are a C feature introduced in C99 whereby one can declare the last element to be an array of unspecified size. For example: )
struct header {
size_t len;
unsigned char data[];
};
It is an accepted "fact" that using goto is poor software engineering practice. That doesn't make it true. There are times when goto is useful, particularly when handling cleanup and when porting from assembler.
Flexible array members strike me as having one main use, off the top of my head, which is mapping legacy data formats like window template formats on RiscOS. They would have been supremely useful for this about 15 years ago, and I'm sure there are still people out there dealing with such things who would find them useful.
If using flexible array members is bad practice, then I suggest that we all go tell the authors of the C99 spec this. I suspect they might have a different answer.
No, using flexible array members in C is not bad practice.
This language feature was first standardized in ISO C99, 6.7.2.1 (16). In the following revision, ISO C11, it is specified in Section 6.7.2.1 (18).
You can use them like this:
struct Header {
size_t d;
long v[];
};
typedef struct Header Header;
size_t n = 123; // can dynamically change during program execution
// ...
Header *h = malloc(sizeof(Header) + sizeof(long[n]));
h->n = n;
Alternatively, you can allocate like this:
Header *h = malloc(sizeof *h + n * sizeof h->v[0]);
Note that sizeof(Header) includes eventual padding bytes, thus, the following allocation is incorrect and may yield a buffer overflow:
Header *h = malloc(sizeof(size_t) + sizeof(long[n])); // invalid!
A struct with a flexible array members reduces the number of allocations for it by 1/2, i.e. instead of 2 allocations for one struct object you need just 1. Meaning less effort and less memory occupied by memory allocator bookkeeping overhead. Furthermore, you save the storage for one additional pointer. Thus, if you have to allocate a large number of such struct instances you measurably improve the runtime and memory usage of your program (by a constant factor).
In contrast to that, using non-standardized constructs for flexible array members that yield undefined behavior (e.g. as in long v[0]; or long v[1];) obviously is bad practice. Thus, as any undefined-behaviour this should be avoided.
Since ISO C99 was released in 1999, more than 20 years ago, striving for ISO C89 compatibility is a weak argument.
PLEASE READ CAREFULLY THE COMMENTS BELOW THIS ANSWER
As C Standardization move forward there is no reason to use [1] anymore.
The reason I would give for not doing it is that it's not worth it to tie your code to C99 just to use this feature.
The point is that you can always use the following idiom:
struct header {
size_t len;
unsigned char data[1];
};
That is fully portable. Then you can take the 1 into account when allocating the memory for n elements in the array data :
ptr = malloc(sizeof(struct header) + (n-1));
If you already have C99 as requirement to build your code for any other reason or you are target a specific compiler, I see no harm.
You meant...
struct header
{
size_t len;
unsigned char data[];
};
In C, that's a common idiom. I think many compilers also accept:
unsigned char data[0];
Yes, it's dangerous, but then again, it's really no more dangerous than normal C arrays - i.e., VERY dangerous ;-) . Use it with care and only in circumstances where you truly need an array of unknown size. Make sure you malloc and free the memory correctly, using something like:-
foo = malloc(sizeof(header) + N * sizeof(data[0]));
foo->len = N;
An alternative is to make data just be a pointer to the elements. You can then realloc() data to the correct size as required.
struct header
{
size_t len;
unsigned char *data;
};
Of course, if you were asking about C++, either of these would be bad practice. Then you'd typically use STL vectors instead.
I've seen something like this:
from C interface and implementation.
struct header {
size_t len;
unsigned char *data;
};
struct header *p;
p = malloc(sizeof(*p) + len + 1 );
p->data = (unsigned char*) (p + 1 ); // memory after p is mine!
Note: data need not be last member.
As a side note, for C89 compatibility, such structure should be allocated like :
struct header *my_header
= malloc(offsetof(struct header, data) + n * sizeof my_header->data);
Or with macros :
#define FLEXIBLE_SIZE SIZE_MAX /* or whatever maximum length for an array */
#define SIZEOF_FLEXIBLE(type, member, length) \
( offsetof(type, member) + (length) * sizeof ((type *)0)->member[0] )
struct header {
size_t len;
unsigned char data[FLEXIBLE_SIZE];
};
...
size_t n = 123;
struct header *my_header = malloc(SIZEOF_FLEXIBLE(struct header, data, n));
Setting FLEXIBLE_SIZE to SIZE_MAX almost ensures this will fail :
struct header *my_header = malloc(sizeof *my_header);
There are some downsides related to how structs are sometimes used, and it can be dangerous if you don't think through the implications.
For your example, if you start a function:
void test(void) {
struct header;
char *p = &header.data[0];
...
}
Then the results are undefined (since no storage was ever allocated for data). This is something that you will normally be aware of, but there are cases where C programmers are likely used to being able to use value semantics for structs, which breaks down in various other ways.
For instance, if I define:
struct header2 {
int len;
char data[MAXLEN]; /* MAXLEN some appropriately large number */
}
Then I can copy two instances simply by assignment, i.e.:
struct header2 inst1 = inst2;
Or if they are defined as pointers:
struct header2 *inst1 = *inst2;
This however won't work for flexible array members, since their content is not copied over. What you want is to dynamically malloc the size of the struct and copy over the array with memcpy or equivalent.
struct header3 {
int len;
char data[]; /* flexible array member */
}
Likewise, writing a function that accepts a struct header3 will not work, since arguments in function calls are, again, copied by value, and thus what you will get is likely only the first element of your flexible array member.
void not_good ( struct header3 ) ;
This does not make it a bad idea to use, but you do have to keep in mind to always dynamically allocate these structures and only pass them around as pointers.
void good ( struct header3 * ) ;

Resources