Flexible Array Member (Zero Length Array) [duplicate] - c

This question already has answers here:
Array of zero length
(5 answers)
Closed 8 years ago.
In reference to GCC's Zero Length Array explanation:
This is particularly useful in the case when a struct is a header for a variable-length object. This is exactly my case. Furthermore, I am concerned with the alignment of my structs in the heap.
In this case, I still really do not understand what's useful about zero length arrays. How are they related to this particular situation?
EDIT:
Is it that I can put as much "data" as I want in there?

Basically it allows you to have a structure with a variable length array at the end:
struct X {
int first;
int rest[0];
};
An array size of 0 is not actually valid (although it is allowed as a gcc extension). Having an unspecified size is the correct way.
Since C doesn't really care that you are accessing elements beyond the end of the array, you just start with an undefined array size, and then allocate enough memory to handle however many elements you actually want:
struct X *xp = (struct X *)malloc(sizeof(struct X)+10*sizeof(int));
xp->rest[9] = 0;

The memory returned from malloc() that you will assign to a pointer to the struct with the zero-length array member will be aligned memory ... that's required by the C99 specification. So there is no issue overlaying a struct with a zero-length array over memory allocated from the heap via malloc().
Where you would run into trouble would be attempting to overlay your struct over some raw buffer in memory that came from a packed or non-traditionally aligned data-source, such as a file-header, memory-mapped hardware interface, etc. In those cases, using a zero-length array to handle the variable-length data can be a bad idea since the data may not be aligned according to the default alignment parameters of the platform, and thus offsetting into the array will not yield the correct results.

Related

How can I allocate memory for nested zero length array?

I want to create a struct Ring with nestd zero lenght array:
typedef struct Data_Block
{
size_t Data_Len;
char Buf[0];
}Block;
typedef struct Block_Ring
{
int head;
int tail;
int full;
int block_num;
Block blk[0];
}Ring;
How can I correctly allocate memory for a Ring which contains 32 Block, and one Block contains Buf of size 16? Because if I malloc with the right size, the number of Block will become just one.
Fundamental Problem
There is a fundamental problem you must deal with before addressing this task: Constructing an array requires having elements of a fixed and known size.
That is because array element i is located by adding i times the size of an element to the base address of the array. One can perform that calculation only if the size of an element exists (elements have a fixed size) and you know it (the size is known).
Although you define Block to contain a member of zero size (Buf is an array with zero elements), you intend to use it as if that member were 16 bytes (an array of 16 char). However, there is no way to tell the compiler that the Block objects you will allocate and use are actually Block objects with 16 extra bytes. You certainly can allocate space for them, and I will show you how, but then how do you intend to use them? If x is a Ring object, and you write x.blk[i], the compiler will generate code that multiplies i by what it thinks the size of a Block is, and that will be wrong because the compiler thinks a Block has zero bytes for Buf, but your Block objects are bigger.
Standard C Versus An Extension
Declaring a structure member as an array with zero elements is an extension (notably available in GCC). The 1999 C standard introduced a similar feature called flexible array members. With standard C, a flexible array member is declared with no dimension, rather than a zero dimension.
A flexible array member is an incomplete type (C 2018 6.7.2.1 18). In other words, the type is not fully specified. The number of members of the array is unknown, and so the total size of the array is unknown.
Then, in defining Ring, we cannot define the blk member to be a flexible array member that is an array of Block, because standard C requires that the element type of an array be a complete type (C 2018 6.7.6.2 1, “The element type shall not be an incomplete or function type”).
Therefore, this code cannot be made into standard C. This is actually an advantage: The C standard is preventing you from making the fundamental mistake above of creating an array that cannot work because the size of its elements is not known.
Oddly, GCC 8.1 for x86-64 fails to give a diagnostic for this. It should give a diagnostic for the constraint violation. Apple LLVM version 9.1.0 (clang-902.0.39.2) does issue a diagnostic.
However, we will proceed to consider the code as you have written it, using the language extension.
How Big Are The Elements?
When a C implementation lays out a structure, it must ensure that each member in the structure is correctly aligned. (What alignments are correct is defined by the implementation, so they vary. However, whatever they are, the compiler must lay out the structure accordingly.) Since structures can be used as elements of an array, the size of the laid-out structure must be such that when one structure follows another in the array, all the members in the following structure are also correctly aligned.
Satisfying this constraint requires that the size of the structure be a multiple of the alignment requirements of all members. For example, if there are members with alignment requirements of 4 bytes and 8 bytes, the size of the structure must be a multiple of 8 bytes, since that is the least common multiple of 4 bytes and 8 bytes. In fact, all alignment requirements are powers of two, so the least common multiple of all the alignment requirements is simply the largest (most restrictive) alignment requirement.
What this means is that, when allocating space for an array of your Block objects, you cannot simply use an arbitrary number of bytes for the extra Buf elements. You must ensure the total size of each Block object is a multiple of the alignment requirement of its members.
C provides a way to know the alignment requirement of the structure. The expression _Alignof(Block) is the alignment requirement. So, if you want each Block to have x elements in Buf, the size you need for each Block is the size of the base structure (sizeof(Block)) plus the size you need for the actual array elements (x * sizeof(char)) plus enough padding to round the total up to a multiple of the alignment requirement. You can calculate this with:
// Calculate desired space.
size_t S = sizeof(Block) + x * sizeof(char);
// Note the alignment requirement.
static const size_t A = _Alignof(Block);
// Round up to multiple of alignment requirement.
S = (S-1) / A * A + 1;
(This is a well-known expression for rounding up to a multiple of A. You can tinker with some examples to see why it works.)
Once you have calculated the space needed for one Block using the above code (with 16 for x), you can allocate space for one Ring with an array of 32 of these Block using:
Ring *R = malloc(sizeof(Ring) + 32 * S);
Accessing Array Elements
Now that you have the space, how do you access members of blk? As discussed above, the compiler does not know how to do this. Unfortunately, C does not provide any assistance. You will have to calculate addresses manually. Since you know the size of each of your Block objects, S, you can calculate the address of the Block with index i with:
Block *B = (Block *) ((char *) R->blk + S*i);
Discussion
This is cumbersome and error-prone. The address calculation could be wrapped into a helper function to make it a little better. However, it is generally not a good idea to use complicated code like this. You ought to consider alternative solutions.

Array of size 0 at the end of struct [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.
Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.
Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).
The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

C Array as function parameter: Size check? [duplicate]

This question already has answers here:
C sizeof a passed array [duplicate]
(7 answers)
How to find the size of an array (from a pointer pointing to the first element array)?
(17 answers)
Closed 7 years ago.
I wrote the following function in C:
void dummy(int* my_array, int size)
{
//my implementation
}
Is there any way to check whether size is REALLY the size of my_array?
For example, if I call the function and use for my_array an array with 4 elements but pass 5 for size, is there any way to know that 5 is not really the size of the array?
Thank you in evidence.
You're looking at it the wrong way. An array is contiguous piece of memory. In C, you can represent this concept with a pointer to its start and its size. Since your array is represented by a <my_array, size> tuple, it doesn't make sense to talk about my_array's size, since it's only the start-pointer of the array.
unfortunately, there is no such way in C. As an array is always a pointer to an arbitrary address in memory, there is no such thing as "array bounds" that can be checked.
This is why many of the C functions have that special size parameter: because there's no other way of determining an array's size.
The size of the array is however big you made it when you declared and initialized it.
int my_array[100] = {0}; has a size of 100.
int my_other_array[50] = {0}; has a size of 50.
If you want the length of the data, then you're thinking in more abstract terms than the language can handle. The length of your data is a non-measurable parameter when the language does not support it.
If my_array is dynamically allocated (with malloc) you can get the size of memory block but it depends on specific compiler. Unfortunately in most cases there is address alignments and you will not have the exact size to 1 byte but aligned to 32 bits or other.

malloced array VS. variable-length-array [duplicate]

This question already has answers here:
What's the difference between a VLA and dynamic memory allocation via malloc?
(4 answers)
Closed 6 years ago.
There are two ways to allocate memory to an array, of which the size is unknown at the beginning. The most common way is using malloc like this
int * array;
... // when we know the size
array = malloc(size*sizeof(int));
But it's valid too in C99 to define the array after we know the size.
... // when we know the size
int array[size];
Are they absolutely the same?
No they're not absolutely the same. While both let you store the same number and type of objects, keep in mind that:
You can free() a malloced array, but you can't free() a variable length array (although it goes out of scope and ceases to exist once the enclosing block is left). In technical jargon, they have different storage duration: allocated for malloc versus automatic for variable length arrays.
Although C has no concept of a stack, many implementation allocate a variable length array from the stack, while malloc allocates from the heap. This is an issue on stack-limited systems, e.g. many embedded operating systems, where the stack size is on the order of kB, while the heap is much larger.
It is also easier to test for a failed allocation with malloc than with a variable length array.
malloced memory can be changed in size with realloc(), while VLAs can't (more precisely only by executing the block again with a different array dimension--which loses the previous contents).
A hosted C89 implementation only supports malloc().
A hosted C11 implementation may not support variable length arrays (it then must define __STDC_NO_VLA__ as the integer 1 according to C11 6.10.8.3).
Everything else I have missed :-)

Array of structure with struct hack

Suppose I have structure in C like this
struct A {
int len;
char s[1];
}
I want to have an array of above structure but char s[1] member of struct A can be of variable length. How can we do this? Even struct hack trick in C99 doesn't seem to work here. One solution is to have char * as a last member and do dynamic memory allocation, but I want all data of struct to be in contiguous locations as my implementation needs to be cache oblivious.
You can't have an array of variable size objects, so you can't have an array of structures using the struct hack. All the objects in an array must be the same size. And if they're all the same size, the size must be implied by the structure, so you won't be using the struct hack after all; there'll be a size other than 1 in the dimension of the array s in your structure (unless 1 is big enough for everything). The reason is that the storage location for a[i] (where a is the name of an array and i is an index into the array) must be computable as 'byte address of a plus (i times size of one object in array)'. So the size of the objects in the array (in this case, structures) must be known and fixed.
As an alternative, you can have an array of pointers to variable size objects; you simply have to arrange to allocate each object separately with the appropriate size, and save the pointer to that in the array.
Note that C99 does away with the 'struct hack' (which was never officially portable, though in practice it was) and introduces 'flexible array members' instead:
struct A {
int len;
char data[];
};
However, the advice above still applies.
If there is a maximum size for "s", you could use that instead of [1]. That keeps everything contiguous.
If you really don't want to use dynamic memory, then you can't do it with an array. You need your own "manager" that will use the struct hack trick on each member individually - but that means you can't do indexed lookups - you have to look at each element to see how big it is and jump the right number of bytes to the next element.
In C, array indexing involves multiplying the base address by the compile-time-constant size of an individual element. For that reason, you can't use inbuilt array support directly with the "struct hack", as each s element will be allocated exactly the 1 byte you request, and indices further past the struct will access following S elements in the array (or go off the end completely, possibly crashing).
If you really need contiguous data for cache-access speed, you can pack it yourself, you can solve this (like most things) with an indirection... have a contiguous array of S*, and manually pack your data into another contiguous buffer (malloc() or stack-allocate enough memory for all your S objects including the real data size of all s[] members). Your performance may suffer (or your OS crash) if the int len elements aren't optimally (properly) aligned for your architecture, so you may need to manually pad between S instances.
S* index[100] char data[10000];
(S*)(data) --------------> S with 14-byte s[] using data[0]..[17]
(S*)(data + 20) -----\ 2 byte padding so next S is 4-byte aligned
(S*)(data + 32) --\ \---> S with 7-byte s[] using data[20]..[30]
\ 1 byte padding...
\-----> ...
Unfortunately, this is quite an inflexible data layout - you can't just grow the amount of data in an element's s member without schuffling all the other data out of the way and patching the index, but that's normal for arrays so if you were already considering using them then perhaps this will suit you. Another hassle is calculating the total size of S structs (including s[] and any padding) up front....

Resources