How are zero-length arrays in a structure useful? [duplicate] - c

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 8 years ago.
Quoting from this site,
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure that is really a header for a variable-length object.
Code:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)malloc(sizeof (struct line) + this_length);
thisline->length = this_length;
I really did not understand what they have meant by this: "last element of a structure that is really a header for a variable-length object". Can someone elaborate on that sentence?

Zero-length arrays (which have been replaced with flexible array members in C99) let you allocate structures with variable size in a single allocation instead of two allocations. Without flexible array members you would be forced to do this:
struct line {
int length;
char *contents;
};
struct line *thisline = malloc(sizeof (struct line));
thisline->length = this_length;
thisline->contents=malloc(this_length);
Flexible array member saves you from making the second allocation, and also from having to call free a second time.
In addition to its convenience, this little trick helps you save on memory overhead. Each time you allocate a block of memory, malloc needs to store some "bookkeeping" information for the use of free. In addition, the blocks the malloc allocates are sized at specific increments, leaving small unused portions at the end of each block. By reducing the number of allocations in half you can reduce the overhead by a significant fraction.

It is old C. With current C99 or C11 standard you can use flexible array members (which should be the last member of a struct) like:
struct line {
unsigned length;
char contents[];
};
and you need a convention about the occupied size of the flexible array. If contents is a C string (like we all expect), you could suppose its size is length+1 (since you need a terminating null byte). Then you should comment that.
See also this & that

I really did not understand what they have meant by this: "last element of a structure that is really a header for a variable-length object".
It means this is OK (in GNU C):
struct line {
int length;
char contents[0]; // OK, last member of the structure
};
but this is not:
struct line {
char contents[0]; // Wrong, not the last member of the structure
int length;
};

Related

What is the use of an array in struct with empty indices? [duplicate]

This question already has answers here:
What are the real benefits of flexible array member?
(3 answers)
Closed 2 years ago.
I saw an array with no size mentioned in the indices within a structure definition here.
Then I tried repeating same for a sample code
struct some {
char a[];
};
and it throws an error: flexible array member in otherwise empty struct.
So I tried adding another member
struct some {
unsigned int x;
char a[];
};
and it compiles successfully; the observation that I made was sizeof(struct some) = sizeof(unsigned int). Which means, a does not get any share in memory.
What is flexible array in C?
What is the use of such arrays when there's no memory allocated?
Is it a C standard or GNU's option?
And, is there any use in using comma , in the end while initializing an array?
struct other array[] = {
{....},
{....},
};
What is flexible array in C?
Declaring an array with empty [] as the last member of a structure defines a flexible array member. The base size of the structure does not include any space for the array (although it may include padding that is necessary for alignment), but you can allocate as much space as desired when allocating space for the entire structure (using malloc or related routines) and use the array as if it contained as many elements as you have provided space for.
What is the use of such arrays when there's no memory allocated?'
You are responsible for allocating the space yourself.
Is it a C standard or GNU's option?
It is standard C.
And, is there any use in using comma , in the end while initializing an array?
It is a convenience to allow for easy editing or generations of lists of initializers using macros or other means.
For the first question: flexible arrays are useful for reducing memory allocations (instead of allocating a structure and a buffer you allocate just a structure).
An example may make things more easy to understand.
Let's imagine the following scenario: we have a custom file format in which the first 32 bit integer represents a size N and the next N bytes are some payload we are interested in, so:
4 // the next 4 bytes are some data we need to parse
1 2 3 4
8 // the next 8 bytes represent another chunk of data
1 2 3 5 6 7 8
This can be represented by the following structure:
struct file_chunk {
int count;
char *data;
}
We don't know how many chunks a file will have, so for each chunk we have to do two allocations: one for a file_chunk structure, and another for the data buffer:
file_chunk *allocate_file_chunk(int Count) {
struct file_chunk *chunk = malloc(sizeof(*chunk));
chunk->data = malloc(Count);
return chunk;
}
But we know that we always have a 32-bit integer N followed by N bytes, so we can do the following:
struct file_chunk {
int count;
char data[]
}
file_chunk *allocate_file_chunk(int Count) {
return malloc(sizeof(struct file_chunk) + Count * sizeof(char));
}
Now our structure looks exactly like the section we read from the file.
The feature was introduced in C99. Alternatively, you may see structures defined with an array of size 1 for compilers that do not support flexible array members:
struct file_chunk {
int count;
char data[1]
}
This makes computing the size needed for an allocation slightly more complicated:
size_t file_chunk_size = sizeof(struct file_chunk) + Count * sizeof(char) - sizeof(char);
As for the second question: you can omit the trailing ,. It just makes adding new entries at the end easier (and reduces the lines changed when looking at a diff).

Array of size 0 at the end of struct [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.
Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.
Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).
The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

Why do we use zero length array instead of pointers?

It's said that zero length array is for variable length structure, which I can understand. But what puzzle me is why we don't simply use a pointer, we can dereference and allocate a different size structure in the same way.
EDIT - Added example from comments
Assuming:
struct p
{
char ch;
int *arr;
};
We can use this:
struct p *p = malloc(sizeof(*p) + (sizeof(int) * n));
p->arr = (struct p*)(p + 1);
To get a contiguous chunk of memory. However, I seemed to forget the space p->arr occupies and it seems to be a disparate thing from the zero size array method.
If you use a pointer, the structure would no longer be of variable length: it will have fixed length, but its data will be stored in a different place.
The idea behind zero-length arrays* is to store the data of the array "in line" with the rest of the data in the structure, so that the array's data follows the structure's data in memory. Pointer to a separately allocated region of memory does not let you do that.
* Such arrays are also known as flexible arrays; in C99 you declare them as element_type flexArray[] instead of element_type flexArray[0], i.e. you drop zero.
The pointer isn't really needed, so it costs space for no benefit. Also, it might imply another level of indirection, which also isn't really needed.
Compare these example declarations, for a dynamic integer array:
typedef struct {
size_t length;
int data[0];
} IntArray1;
and:
typedef struct {
size_t length;
int *data;
} IntArray2;
Basically, the pointer expresses "the first element of the array is at this address, which can be anything" which is more generic than is typically needed. The desired model is "the first element of the array is right here, but I don't know how large the array is".
Of course, the second form makes it possible to grow the array without risking that the "base" address (the address of the IntArray2 structure itself) changes, which can be really neat. You can't do that with IntArray1, since you need to allocate the base structure and the integer data elements together. Trade-offs, trade-offs ...
These are various forms of the so-called "struct hack", discussed in question 2.6 of the comp.lang.c FAQ.
Defining an array of size 0 is actually illegal in C, and has been at least since the 1989 ANSI standard. Some compilers permit it as an extension, but relying on that leads to non-portable code.
A more portable way to implement this is to use an array of length 1, for example:
struct foo {
size_t len;
char str[1];
};
You could allocate more than sizeof (struct foo) bytes, using len to keep track of the allocated size, and then access str[N] to get the Nth element of the array. Since C compilers typically don't do array bounds checking, this would generally "work". But, strictly speaking, the behavior is undefined.
The 1999 ISO standard added a feature called "flexible array members", intended to replace this usage:
struct foo {
size_t len;
char str[];
};
You can deal with these in the same way as the older struct hack, but the behavior is well defined. But you have to do all the bookkeeping yourself; sizeof (struct foo) still doesn't include the size of the array, for example.
You can, of course, use a pointer instead:
struct bar {
size_t len;
char *ptr;
};
And this is a perfectly good approach, but it has different semantics. The main advantage of the "struct hack", or of flexible array members, is that the array is allocated contiguously with the rest of the structure, and you can copy the array along with the structure using memcpy (as long as the target has been properly allocated). With a pointer, the array is allocated separately -- which may or may not be exactly what you want.
This is because with a pointer you need a separate allocation and assignment.
struct WithPointer
{
int someOtherField;
...
int* array;
};
struct WithArray
{
int someOtherField;
...
int array[1];
};
To get an 'object' of WithPointer you need to do:
struct WithPointer* withPointer = malloc(sizeof(struct WithPointer));
withPointer.array = malloc(ARRAY_SIZE * sizeof(int));
To get an 'object' of WithArray:
struct WithArray* withArray = malloc(sizeof(struct WithArray) +
(ARRAY_SIZE - 1) * sizeof(int));
That's it.
In some cases it's also very handy, or even necessary, to have the array in consecutive memory; for example in network protocol packets.

Increasing The Size of Memory Allocated to a Struct via Malloc

I just learned that it's possible to increase the size of the memory you'll allocate to a struct when using the malloc function. For example, you can have a struct like this:
struct test{
char a;
int v[1];
char b;
};
Which clearly has space for only 2 chars and 1 int (pointer to an int in reality, but anyway). But you could call malloc in such a way to make the struct holds 2 chars and as many ints as you wanted (let's say 10):
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test)+sizeof(int)*9);
ptr->v[9]=50;
printf("%d\n",ptr->v[9]);
return 0;
}
The output here would be "50" printed on the screen, meaning that the array inside the struct was holding up to 10 ints.
My questions for the experienced C programmers out there:
What is happening behind the scenes here? Does the computer allocate 2+4 (2 chars + pointer to int) bytes for the standard "struct test", and then 4*9 more bytes of memory and let the pointer "ptr" put whatever kind of data it wants on those extra bytes?
Does this trick only works when there is an array inside the struct?
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
...Which clearly has space for only 2 chars and 1 int (pointer to an
int in reality, but anyway)...
Already incorrect. Arrays are not pointers. Your struct holds space for 2 chars and 1 int. There's no pointer of any kind there. What you have declared is essentially equivalent to
struct test {
char a;
int v;
char b;
};
There's not much difference between an array of 1 element and an ordinary variable (there's conceptual difference only, i.e. syntactic sugar).
...But you could call malloc in such a way to make it hold 1 char and as
many ints as you wanted (let's say 10)...
Er... If you want it to hold 1 char, why did you declare your struct with 2 chars???
Anyway, in order to implement an array of flexible size as a member of a struct you have to place your array at the very end of the struct.
struct test {
char a;
char b;
int v[1];
};
Then you can allocate memory for your struct with some "extra" memory for the array at the end
struct test *ptr = malloc(offsetof(struct test, v) + sizeof(int) * 10);
(Note how offsetof is used to calculate the proper size).
That way it will work, giving you an array of size 10 and 2 chars in the struct (as declared). It is called "struct hack" and it depends critically on the array being the very last member of the struct.
C99 version of C language introduced dedicated support for "struct hack". In C99 it can be done as
struct test {
char a;
char b;
int v[];
};
...
struct test *ptr = malloc(sizeof(struct test) + sizeof(int) * 10);
What is happening behind the scenes here? Does the computer allocate
2+4 (2 chars + pointer to int) bytes for the standard "struct test",
and then 4*9 more bytes of memory and let the pointer "ptr" put
whatever kind of data it wants on those extra bytes?
malloc allocates as much memory as you ask it to allocate. It is just a single flat block of raw memory. Nothing else happens "behind the scenes". There's no "pointer to int" of any kind in your struct, so any questions that involve "pointer to int" make no sense at all.
Does this trick only works when there is an array inside the struct?
Well, that's the whole point: to access the extra memory as if it belongs to an array declared as the last member of the struct.
If the array is not the last member of the struct, how does the computer manage the memory block allocated?
It doesn't manage anything. If the array is not the last member of the struct, then trying to work with the extra elements of the array will trash the members of the struct that declared after the array. This is pretty useless, which is why the "flexible" array has to be the last member.
No, that does not work. You can't change the immutable size of a struct (which is a compile-time allocation, after all) by using malloc ( ) at run time. But you can allocate a memory block, or change its size, such that it holds more than one struct:
int main(){
struct test *ptr;
ptr = malloc (sizeof(struct test) * 9);
}
That's just about all you can do with malloc ( ) in this context.
In addition to what others have told you (summary: arrays are not pointers, pointers are not arrays, read section 6 of the comp.lang.c FAQ), attempting to access array elements past the last element invokes undefined behavior.
Let's look at an example that doesn't involve dynamic allocation:
struct foo {
int arr1[1];
int arr2[1000];
};
struct foo obj;
The language guarantees that obj.arr1 will be allocated starting at offset 0, and that the offset of obj.arr2 will be sizeof (int) or more (the compiler may insert padding between struct members and after the last member, but not before the first one). So we know that there's enough room in obj for multiple int objects immediately following obj.arr1. That means that if you write obj.arr1[5] = 42, and then later access obj.arr[5], you'll probably get back the value 42 that you stored there (and you'll probably have clobbered obj.arr2[4]).
The C language doesn't require array bounds checking, but it makes the behavior of accessing an array outside its declared bounds undefined. Anything could happen -- including having the code quietly behave just the way you want it to. In fact, C permits array bounds checking; it just doesn't provide a way to handle errors, and most compilers don't implement it.
For an example like this, you're most likely to run into visible problems in the presence of optimization. A compiler (particularly an optimizing compiler) is permitted to assume that your program's behavior is well-defined, and to rearrange the generated code to take advantage of that assumption. If you write
int index = 5;
obj.arr1[index] = 42;
the compiler is permitted to assume that the index operation doesn't go outside the declared bounds of the array. As Henry Spencer wrote, "If you lie to the compiler, it will get its revenge".
Strictly speaking, the struct hack probably involves undefined behavior (which is why C99 added a well-defined version of it), but it's been so widely used that most or all compilers will support it. This is covered in question 2.6 of the comp.lang.c FAQ.

zero length arrays [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
Recently I came across a structure definition,
struct arr {
int cnt;
struct {
int size;
int *name;
} list[0];
};
and now I don't know the reason for list[0] being declared. What I am interested in is why is this used. Does it have any advantage? If yes, what is it?
The use is for dynamic-length arrays. You can allocate the memory using malloc(), and have the array reside at the end of the structure:
struct arr *my_arr = malloc(sizeof *my_arr + 17 * sizeof *my_arr->list);
my_arr->cnt = 17;
my_arr->list[0].size = 0;
my_arr->list[1].name = "foo";
Actually being able to use 0 for the length is (as pointed out in a comment) a GCC extension. In C99, you can leave out the size literal altogether for the same effect.
Before these things were implemented, you often saw this done with a length of 1, but that complicates the allocation a bit since you must compensate when computing the memory needed.
It is called "struct hack". You can search for it on SO or on the Net
http://www.google.com/search?q=struct+hack&sitesearch=stackoverflow.com/questions
Note that formally it is always illegal to declare arrays of size 0 in C. The code you provided formally is not even compilable. Most C compilers will accept 0-sized array declaration as an extension though, specifically because it is often used in "lazy" version of "struct hack" (it can rely on sizeof to determine how much memory to allocate, since 0-sized array supposedly does not affect the total size of the struct).
An arguably better implementation of struct hack uses an array of size 1
struct arr {
int cnt;
struct {
int size;
int *name;
} list[1];
};
It is "better" because it is formally compilable at least. In order to allocate memory for a struct with N elements in the list, standard offsetof macro is used
arr *a = malloc(offsetof(arr, list) + N * sizeof a->list);
In C99 version of the language specification the "struct hack" is supported through size-less array declaration (with empty []), since 0-sized array declarations are illegal in C99 as well.
Another advantage is if your structure describes on-disk/on-network data. If cnt is 0, the data size may only be the length of cnt.
I'm here just to confirm what I dreaded, that list[0] is not valid.

Resources