Reading this Skip List implementation I came across this code fragment:
typedef struct nodeStructure{
keyType key;
valueType value;
node forward[1]; /* variable sized array of forward pointers */
};
To me it seems that forward[1] denotes a one element array. And the comment calls it a variable sized array.
Do I misunderstand something or this is just a mistake in the source I'm reading?
It is called the struct hack. It is the old form of the flexible array member introduced in C99.
This has been used in the past to mimic a variable array in the last member of a structure but it is not a strictly conformning construct in C.
This is a program paradigm in C that you will see sometimes. When allocating the structure, you will allocate sizeof(struct nodeStructure + numNodes * sizeof(node)).
This allows you to have multiple forward nodes for the struct, even though it is only declared to have one. It's a bit of an ugly hack, but it works.
Typically, when you do this, there will also be a filed called 'count' or something, so that you know how many extra entries are after the node.
This is a common trick for the older C compilers (before C99): compilers allowed you to dereference elements past the end of forward's declared length when it is the last element of the struct; you could then malloc enough memory for the additional node elements, like this:
nodeStructure *ptr = malloc(sizeof(nodeStructure)+4*sizeof(node));
for (int i = 0 ; i != 5 ; i++) { // The fifth element is part of the struct
ptr->forward[i] = ...
}
free(ptr);
The trick lets you embed arrays of variable size in a structure without a separate dynamic allocation. An alternative solution would be to declare node *forward, but then you'd need to malloc and free it separately from the nodeStructure, unnecessarily doubling the number of mallocs and potentially increasing memory fragmentation:
Here is how the above fragment would look without the hack:
typedef struct nodeStructure{
keyType key;
valueType value;
node *forward;
};
nodeStructure *ptr = malloc(sizeof(nodeStructure));
ptr->forward = malloc(5*sizeof(node));
for (int i = 0 ; i != 5 ; i++) {
ptr->forward[i] = ...
}
free(ptr->forward);
free(ptr);
EDIT (in response to comments by Adam Rosenfield): C99 lets you define arrays with no size, like this: node forward[]; This is called flexible array member, it is defined in the section 6.7.2.1.16 of the C99 standard.
The data structure implementation is most likely written against the C90 standard, which did not have flexible array members (added in C99). At that time, it was common to use a 1- or even 0-sized(*) array at the end of a struct to allow access to a dynamically variable number of elements there.
The comment should not be interpreted as meaning C99-style variable length arrays; besides, in C99, the idiomatic and standard-conformant definition for member forward would be node forward[];. A type such as struct nodeStructure with such a member is then called an incomplete type. You can define a pointer to it, but you cannot define a variable of this type or take its size, all operations that node forward[0] or node forward[1] allow, although these operations arguably mismatch the programmer's intent.
(*) 0-sized arrays are forbidden by the standard but GCC accepted these as an extension for precisely this use.
Related
This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.
Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.
Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).
The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.
I use sizeof to get size of a struct in C, but the result I got is unexpected.
struct sdshdr {
int len;
int free;
char buf[];
};
int main(){
printf("struct len:%d\n",(sizeof(struct sdshdr)));
return 0;
} //struct len:8, with or without buf
my question is why does buf not occupy any space and why is the size of the int type still 4 on a 64-bit CPU?
here is the output from gcc -v:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.4.0
Thread model: posix
The [] is a flexible array member. They do not count towards the total size of the struct, because the C standard explicitly says so:
6.7.2.1/18
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply.
This is intentional by design, because the purpose of a flexible array member is to allow you to allocate trailing data dynamically after the struct. (When the struct is a file header, protocol header etc.)
Example including discussion about non-standard gcc extensions and the old pre-C99 "struct hack".
From C99 onwards the size of an array at the end of a struct may be omitted. For purposes of sizeof(struct) this array will appear to have zero size (although its presence may add some padding to the struct), but the intent is for its length to be flexible, i.e., when allocating space for the struct one must allocate the desired amount of extra space for the array at the end. (To avoid going out of bounds, the actual allocated length of the array should be stored somewhere.)
Before C99 it was a fairly common hack to have an array of size 1 (or 0 where allowed by the compiler) at the end of a struct and then allocate more space for it, so C99 made this practice explicitly allowed by introducing the flexible array member with no size given.
As a GNU c extension, you have zero-length arrays:
As a GNU extension, the number of elements can be as small as zero. Zero-length arrays are useful as the last element of a structure which is really a header for a variable-length object:
for example, consider this code from The gnu c manual
struct line
{
int length;
char contents[0];
};
{
struct line *this_line = (struct line *)
malloc (sizeof (struct line) + this_length);
this_line -> length = this_length;
}
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.)
buf here is a flexible array member
Flexible array members have incomplete type, and so the sizeof operator may not be applied whereas original implementation of zero-length arrays, sizeof evaluates to zero.
I must say, I have quite a conundrum in a seemingly elementary problem. I have a structure, in which I would like to store an array as a field. I'd like to reuse this structure in different contexts, and sometimes I need a bigger array, sometimes a smaller one. C prohibits the use of variable-sized buffer. So the natural approach would be declaring a pointer to this array as struct member:
struct my {
struct other* array;
}
The problem with this approach however, is that I have to obey the rules of MISRA-C, which prohibits dynamic memory allocation. So then if I'd like to allocate memory and initialize the array, I'm forced to do:
var.array = malloc(n * sizeof(...));
which is forbidden by MISRA standards. How else can I do this?
Since you are following MISRA-C, I would guess that the software is somehow mission-critical, in which case all memory allocation must be deterministic. Heap allocation is banned by every safety standard out there, not just by MISRA-C but by the more general safety standards as well (IEC 61508, ISO 26262, DO-178 and so on).
In such systems, you must always design for the worst-case scenario, which will consume the most memory. You need to allocate exactly that much space, no more, no less. Everything else does not make sense in such a system.
Given those pre-requisites, you must allocate a static buffer of size LARGE_ENOUGH_FOR_WORST_CASE. Once you have realized this, you simply need to find a way to keep track of what kind of data you have stored in this buffer, by using an enum and maybe a "size used" counter.
Please note that not just malloc/calloc, but also VLAs and flexible array members are banned by MISRA-C:2012. And if you are using C90/MISRA-C:2004, there are no VLAs, nor are there any well-defined use of flexible array members - they invoked undefined behavior until C99.
Edit: This solution does not conform to MISRA-C rules.
You can kind of include VLAs in a struct definition, but only when it's inside a function. A way to get around this is to use a "flexible array member" at the end of your main struct, like so:
#include <stdio.h>
struct my {
int len;
int array[];
};
You can create functions that operate on this struct.
void print_my(struct my *my) {
int i;
for (i = 0; i < my->len; i++) {
printf("%d\n", my->array[i]);
}
}
Then, to create variable length versions of this struct, you can create a new type of struct in your function body, containing your my struct, but also defining a length for that buffer. This can be done with a varying size parameter. Then, for all the functions you call, you can just pass around a pointer to the contained struct my value, and they will work correctly.
void create_and_use_my(int nelements) {
int i;
// Declare the containing struct with variable number of elements.
struct {
struct my my;
int array[nelements];
} my_wrapper;
// Initialize the values in the struct.
my_wrapper.my.len = nelements;
for (i = 0; i < nelements; i++) {
my_wrapper.my.array[i] = i;
}
// Print the struct using the generic function above.
print_my(&my_wrapper.my);
}
You can call this function with any value of nelements and it will work fine. This requires C99, because it does use VLAs. Also, there are some GCC extensions that make this a bit easier.
Important: If you pass the struct my to another function, and not a pointer to it, I can pretty much guarantee you it will cause all sorts of errors, since it won't copy the variable length array with it.
Here's a thought that may be totally inappropriate for your situation, but given your constraints I'm not sure how else to deal with it.
Create a large static array and use this as your "heap":
static struct other heap[SOME_BIG_NUMBER];
You'll then "allocate" memory from this "heap" like so:
var.array = &heap[start_point];
You'll have to do some bookkeeping to keep track of what parts of your "heap" have been allocated. This assumes that you don't have any major constraints on the size of your executable.
I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003
Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.
There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array
This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.
Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.
It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.
After reading some posts related to flexible array member, I am still not fully understand why we need such a feature.
Possible Duplicate:
Flexible array members in C - bad?
Is this a Flexible Array Struct Members in C as well?
(Blame me if I didn't solve my problem from the possible duplicate questions above)
What is the real difference between the following two implementations:
struct h1 {
size_t len;
unsigned char *data;
};
struct h2 {
size_t len;
unsigned char data[];
};
I know the size of h2 is as if the flexible array member (data) were omitted, that is, sizeof(h2) == sizeof(size_t). And I also know that the flexible array member can only appear as the last element of a structure, so the original implementation can be more flexible in the position of data.
My real problem is that why C99 add this feature? Simply because sizeof(h2) doesn't contain the real size of data? I am sure that I must miss some more important points for this feature. Please point it out for me.
The two structs in your post don't have the same structure at all. h1 has a integer and a pointer to char. h2 has an integer, and an array of characters inline (number of elements determined at runtime, possibly none).
Said differently, in h2 the character data is inside the struct. In h1 it has to be somewhere outside.
This makes a lot of difference. For instance, if you use h1 you need to take care of allocating/freeing the payload (in addition to the struct itself). With h2, only one allocation/free is necessary, everything is packaged together.
One case where using h2 might make sense is if you're communicating with something that expects messages in the form of {length,data} pairs. You allocate an instance of h2 by requesting sizeof(h2)+how many payload chars you want, fill it up, and then you can transfer the whole thing in a single write (taking care about endianess and such of course). If you had used h1, you'd need two write calls (unless you want to send the memory address of the data, which usually doesn't make any sense).
So this feature exists because it's handy. And various (sometimes non-portable) tricks where used before that to simulate this feature. Adding it to the standard makes sense.
The main reason the Committee introduced flexible array members is to implement the famous struct hack. See the below quote from the C99 Rationale, especially the part I add the emphasis.
Rationale for International Standard — Programming Languages — C §6.7.2.1 Structure and union specifiers
There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
struct s
{
int n_items;
/* possibly other fields */
int items[1];
};
struct s *p;
size_t n, i;
/* code that sets n omitted */
p = malloc(sizeof(struct s) + (n - 1) * sizeof(int));
/* code to check for failure omitted */
p->n_items = n;
/* example usage */
for (i = 0; i < p->n_items; i++)
p->items[i] = i;
The validity of this construct has always been questionable. In the response to one Defect
Report, the Committee decided that it was undefined behavior because the array p->items
contains only one item, irrespective of whether the space exists. An alternative construct was suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call, this is used in the same way as the struct hack, but is now explicitly valid code.
There are a few restrictions on flexible array members that ensure that code using them makes sense. For example, there must be at least one other member, and the flexible array must occur last. Similarly, structures containing flexible arrays can't occur in other structures or in arrays. Finally, sizeof applied to the structure ignores the array but counts any padding before it. This makes the malloc call as simple as possible.
I don't know if this is considered as an important point, but GCC docs points this out:
GCC allows static initialization of flexible array members. This is equivalent to defining a new structure containing the original structure followed by an array of sufficient size to contain the data. E.g. in the following, f1 is constructed as if it were declared like f2.
struct f1 {
int x; int y[];
} f1 = { 1, { 2, 3, 4 } };
struct f2 {
struct f1 f1; int data[3];
} f2 = { { 1 }, { 2, 3, 4 } };
(taken from http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html)