Array of size 0 at the end of struct [duplicate] - c

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?

Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.

Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.

Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).

A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).

The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

Related

Why does GCC allow zero length array only as last member?

According to this,
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
It is said that the benefit is
They are very useful as the last element of a structure that is really
a header for a variable-length object
What does it mean?
The zero-length array is a GCC extension (read as: not standard) which you should not use.
While recent versions of C allow for someting similar (flexible array member with empty brackets), C++ knows no such thing. As people often mix C and C++, this is a possible source of confusion.
Instead, an array of length 1 should be used, which is standards-compliant under both C and C++, and which just works with every compiler.
What is this useful for at all?
Sometimes you need to access "invalid" out-of-bounds data knowing that it is valid in reality. In the strictest sense, this is undefined behavior (since you are accessing out-of-bounds values which are indeterminate, and using indeterminate values is UB), but that is only for what the compiler knows, not for what it fact, so it nevertheless "works fine".
For example, you might receive framed data on the network consisting of a tag word, a length, and an amount of data corresponding to the length given. Or an operating system function might return a variable amount of results to you (a couple of Win32 API functions work that way, for example).
In either case, you have a unknown (unknown at compile time) number of elements at the end of this structure, so it is not possible to define a single legitimate structure to hold everything.
That is what flexible array members are for. And with this, it is explained why they must be the last member as well. It doesn't make sense for something that could have "any size" to be anywhere but at the end -- it's impossible for the compiler to lay out any members after it, not knowing its size.
(In case you wonder how the compiler can ever free the storage not knowing the objects's size... it cannot! There normally exists an explicit function for freeing such an object as part of the API, which takes care of this exact problem.)
It's probably best to demonstrate with a small example:
#include <stdio.h>
#include <stdlib.h>
#define BLOB_TYPE_FOO 0xBEEF
struct blob {
/* Part of your object header... perhaps describing the type of blob. */
int type;
/* This is actually the length of the "data" field below */
unsigned length;
/* The data */
unsigned char data[];
};
struct blob *
create_blob(int type, size_t size)
{
/* Allocate enough space for the "header" and "size" bytes of data. */
struct blob *x = calloc(1, sizeof(struct blob) + size);
x->type = type;
x->length = size;
return x;
}
int
main(void)
{
/* Note that sizeof(struct blob) doesn't include the data field. */
printf("sizeof(struct blob): %zu\n", sizeof(struct blob));
struct blob *x = create_blob(BLOB_TYPE_FOO, 1000);
/*
You can manipulate data here, but be careful not to exceed the
allocated size.
*/
size_t i;
for (i = 0; i < 1000; i++)
{
x->data[i] = 'A' + (i % 26);
}
/*
Since data was allocated with the rest of the header, everything is
freed.
*/
free(x);
return 0;
}
The nice part about this setup is that sizeof(struct blob) represents the size of the "object header" (on my machine, that's 8 bytes), and that since you allocate the whole object together, a single free() is all that is needed to release the memory.
Like others have stated here, this is a non-standard extension and you should really consider using it with care. Damon's answer is the better way to go, though the sizeof() operation is not quite the right size (it's a bit too large to represent the size of the actual header). It's not too hard to workaround that problem though.
You cannnot have the array of 0 length because if you try to make a zero length array then it would mean that you are trying to create a pointer to nothing which is not correct. The C standard says:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.

Internal mechanism of sizeof in C?

I use sizeof to get size of a struct in C, but the result I got is unexpected.
struct sdshdr {
int len;
int free;
char buf[];
};
int main(){
printf("struct len:%d\n",(sizeof(struct sdshdr)));
return 0;
} //struct len:8, with or without buf
my question is why does buf not occupy any space and why is the size of the int type still 4 on a 64-bit CPU?
here is the output from gcc -v:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.4.0
Thread model: posix
The [] is a flexible array member. They do not count towards the total size of the struct, because the C standard explicitly says so:
6.7.2.1/18
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply.
This is intentional by design, because the purpose of a flexible array member is to allow you to allocate trailing data dynamically after the struct. (When the struct is a file header, protocol header etc.)
Example including discussion about non-standard gcc extensions and the old pre-C99 "struct hack".
From C99 onwards the size of an array at the end of a struct may be omitted. For purposes of sizeof(struct) this array will appear to have zero size (although its presence may add some padding to the struct), but the intent is for its length to be flexible, i.e., when allocating space for the struct one must allocate the desired amount of extra space for the array at the end. (To avoid going out of bounds, the actual allocated length of the array should be stored somewhere.)
Before C99 it was a fairly common hack to have an array of size 1 (or 0 where allowed by the compiler) at the end of a struct and then allocate more space for it, so C99 made this practice explicitly allowed by introducing the flexible array member with no size given.
As a GNU c extension, you have zero-length arrays:
As a GNU extension, the number of elements can be as small as zero. Zero-length arrays are useful as the last element of a structure which is really a header for a variable-length object:
for example, consider this code from The gnu c manual
struct line
{
int length;
char contents[0];
};
{
struct line *this_line = (struct line *)
malloc (sizeof (struct line) + this_length);
this_line -> length = this_length;
}
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.)
buf here is a flexible array member
Flexible array members have incomplete type, and so the sizeof operator may not be applied whereas original implementation of zero-length arrays, sizeof evaluates to zero.

Why do we use zero length array instead of pointers?

It's said that zero length array is for variable length structure, which I can understand. But what puzzle me is why we don't simply use a pointer, we can dereference and allocate a different size structure in the same way.
EDIT - Added example from comments
Assuming:
struct p
{
char ch;
int *arr;
};
We can use this:
struct p *p = malloc(sizeof(*p) + (sizeof(int) * n));
p->arr = (struct p*)(p + 1);
To get a contiguous chunk of memory. However, I seemed to forget the space p->arr occupies and it seems to be a disparate thing from the zero size array method.
If you use a pointer, the structure would no longer be of variable length: it will have fixed length, but its data will be stored in a different place.
The idea behind zero-length arrays* is to store the data of the array "in line" with the rest of the data in the structure, so that the array's data follows the structure's data in memory. Pointer to a separately allocated region of memory does not let you do that.
* Such arrays are also known as flexible arrays; in C99 you declare them as element_type flexArray[] instead of element_type flexArray[0], i.e. you drop zero.
The pointer isn't really needed, so it costs space for no benefit. Also, it might imply another level of indirection, which also isn't really needed.
Compare these example declarations, for a dynamic integer array:
typedef struct {
size_t length;
int data[0];
} IntArray1;
and:
typedef struct {
size_t length;
int *data;
} IntArray2;
Basically, the pointer expresses "the first element of the array is at this address, which can be anything" which is more generic than is typically needed. The desired model is "the first element of the array is right here, but I don't know how large the array is".
Of course, the second form makes it possible to grow the array without risking that the "base" address (the address of the IntArray2 structure itself) changes, which can be really neat. You can't do that with IntArray1, since you need to allocate the base structure and the integer data elements together. Trade-offs, trade-offs ...
These are various forms of the so-called "struct hack", discussed in question 2.6 of the comp.lang.c FAQ.
Defining an array of size 0 is actually illegal in C, and has been at least since the 1989 ANSI standard. Some compilers permit it as an extension, but relying on that leads to non-portable code.
A more portable way to implement this is to use an array of length 1, for example:
struct foo {
size_t len;
char str[1];
};
You could allocate more than sizeof (struct foo) bytes, using len to keep track of the allocated size, and then access str[N] to get the Nth element of the array. Since C compilers typically don't do array bounds checking, this would generally "work". But, strictly speaking, the behavior is undefined.
The 1999 ISO standard added a feature called "flexible array members", intended to replace this usage:
struct foo {
size_t len;
char str[];
};
You can deal with these in the same way as the older struct hack, but the behavior is well defined. But you have to do all the bookkeeping yourself; sizeof (struct foo) still doesn't include the size of the array, for example.
You can, of course, use a pointer instead:
struct bar {
size_t len;
char *ptr;
};
And this is a perfectly good approach, but it has different semantics. The main advantage of the "struct hack", or of flexible array members, is that the array is allocated contiguously with the rest of the structure, and you can copy the array along with the structure using memcpy (as long as the target has been properly allocated). With a pointer, the array is allocated separately -- which may or may not be exactly what you want.
This is because with a pointer you need a separate allocation and assignment.
struct WithPointer
{
int someOtherField;
...
int* array;
};
struct WithArray
{
int someOtherField;
...
int array[1];
};
To get an 'object' of WithPointer you need to do:
struct WithPointer* withPointer = malloc(sizeof(struct WithPointer));
withPointer.array = malloc(ARRAY_SIZE * sizeof(int));
To get an 'object' of WithArray:
struct WithArray* withArray = malloc(sizeof(struct WithArray) +
(ARRAY_SIZE - 1) * sizeof(int));
That's it.
In some cases it's also very handy, or even necessary, to have the array in consecutive memory; for example in network protocol packets.

Strange variable-sized array declaration

Reading this Skip List implementation I came across this code fragment:
typedef struct nodeStructure{
keyType key;
valueType value;
node forward[1]; /* variable sized array of forward pointers */
};
To me it seems that forward[1] denotes a one element array. And the comment calls it a variable sized array.
Do I misunderstand something or this is just a mistake in the source I'm reading?
It is called the struct hack. It is the old form of the flexible array member introduced in C99.
This has been used in the past to mimic a variable array in the last member of a structure but it is not a strictly conformning construct in C.
This is a program paradigm in C that you will see sometimes. When allocating the structure, you will allocate sizeof(struct nodeStructure + numNodes * sizeof(node)).
This allows you to have multiple forward nodes for the struct, even though it is only declared to have one. It's a bit of an ugly hack, but it works.
Typically, when you do this, there will also be a filed called 'count' or something, so that you know how many extra entries are after the node.
This is a common trick for the older C compilers (before C99): compilers allowed you to dereference elements past the end of forward's declared length when it is the last element of the struct; you could then malloc enough memory for the additional node elements, like this:
nodeStructure *ptr = malloc(sizeof(nodeStructure)+4*sizeof(node));
for (int i = 0 ; i != 5 ; i++) { // The fifth element is part of the struct
ptr->forward[i] = ...
}
free(ptr);
The trick lets you embed arrays of variable size in a structure without a separate dynamic allocation. An alternative solution would be to declare node *forward, but then you'd need to malloc and free it separately from the nodeStructure, unnecessarily doubling the number of mallocs and potentially increasing memory fragmentation:
Here is how the above fragment would look without the hack:
typedef struct nodeStructure{
keyType key;
valueType value;
node *forward;
};
nodeStructure *ptr = malloc(sizeof(nodeStructure));
ptr->forward = malloc(5*sizeof(node));
for (int i = 0 ; i != 5 ; i++) {
ptr->forward[i] = ...
}
free(ptr->forward);
free(ptr);
EDIT (in response to comments by Adam Rosenfield): C99 lets you define arrays with no size, like this: node forward[]; This is called flexible array member, it is defined in the section 6.7.2.1.16 of the C99 standard.
The data structure implementation is most likely written against the C90 standard, which did not have flexible array members (added in C99). At that time, it was common to use a 1- or even 0-sized(*) array at the end of a struct to allow access to a dynamically variable number of elements there.
The comment should not be interpreted as meaning C99-style variable length arrays; besides, in C99, the idiomatic and standard-conformant definition for member forward would be node forward[];. A type such as struct nodeStructure with such a member is then called an incomplete type. You can define a pointer to it, but you cannot define a variable of this type or take its size, all operations that node forward[0] or node forward[1] allow, although these operations arguably mismatch the programmer's intent.
(*) 0-sized arrays are forbidden by the standard but GCC accepted these as an extension for precisely this use.

zero length arrays [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
Recently I came across a structure definition,
struct arr {
int cnt;
struct {
int size;
int *name;
} list[0];
};
and now I don't know the reason for list[0] being declared. What I am interested in is why is this used. Does it have any advantage? If yes, what is it?
The use is for dynamic-length arrays. You can allocate the memory using malloc(), and have the array reside at the end of the structure:
struct arr *my_arr = malloc(sizeof *my_arr + 17 * sizeof *my_arr->list);
my_arr->cnt = 17;
my_arr->list[0].size = 0;
my_arr->list[1].name = "foo";
Actually being able to use 0 for the length is (as pointed out in a comment) a GCC extension. In C99, you can leave out the size literal altogether for the same effect.
Before these things were implemented, you often saw this done with a length of 1, but that complicates the allocation a bit since you must compensate when computing the memory needed.
It is called "struct hack". You can search for it on SO or on the Net
http://www.google.com/search?q=struct+hack&sitesearch=stackoverflow.com/questions
Note that formally it is always illegal to declare arrays of size 0 in C. The code you provided formally is not even compilable. Most C compilers will accept 0-sized array declaration as an extension though, specifically because it is often used in "lazy" version of "struct hack" (it can rely on sizeof to determine how much memory to allocate, since 0-sized array supposedly does not affect the total size of the struct).
An arguably better implementation of struct hack uses an array of size 1
struct arr {
int cnt;
struct {
int size;
int *name;
} list[1];
};
It is "better" because it is formally compilable at least. In order to allocate memory for a struct with N elements in the list, standard offsetof macro is used
arr *a = malloc(offsetof(arr, list) + N * sizeof a->list);
In C99 version of the language specification the "struct hack" is supported through size-less array declaration (with empty []), since 0-sized array declarations are illegal in C99 as well.
Another advantage is if your structure describes on-disk/on-network data. If cnt is 0, the data size may only be the length of cnt.
I'm here just to confirm what I dreaded, that list[0] is not valid.

Resources