I have a struct that looks sort of like this:
typedef struct foo {
int this;
int that;
int length;
int info[]; // legal for last element of a struct
} Foo;
When I compile it, I get an warning like this:
C4200 nonstandard extension used: zero-sized array in struct/union
Do I just live with the warning, or is there some property I can set to tell Visual Studio to use C-99?
Visual Studio 2015 [almost] fully implements C99, but still treats all C99 features as language extensions (e.g. disabling language extensions disables C99 support as well). Some of these features trigger spurious warnings like the one you observed.
As long as C99 support remains in this semi-official "extension" status, just ignore/disable such warnings.
Note that VS2015 Update 3 no longer issues this warning for such C code.
For those who are curious about the zero-length array or "flexible array" idioms, it's probably worth taking a minute to explain them. This idiom is as old as C itself.
Say you want to pass around a structure that consists of a header and a variable amount of data. It's not known until the structure is allocated how much data will need to be added to it.
The original idiom was to declare the structure like this:
/* Variation 1 */
struct mydata {
int type;
int datalen;
char data[1];
};
Then suppose we wanted to return one of these objects:
struct mydata *
get_some_data()
{
int len;
struct mydata *rval;
len = find_out_how_much_data();
/* Allocate the struct AND enough extra space to hold the data */
rval = malloc(sizeof(*rval) + len - 1);
rval->datalen = len;
read_data(&rval->data[0], len);
return rval;
}
And a caller would access it like this:
void caller()
{
struct mydata *foo = get_some_data();
/* Start accessing foo->datalen bytes of data starting at
* foo->data[0]
*/
free(foo); /* And free it all */
}
The point to this idiom is that the char data[1] declaration is a lie, since the data will surely be longer than that, but the C compiler doesn't do range checking, so everything is cool.
But notice the len - 1 expression in the malloc. This is necessary because declaring data to have a length 1 introduces an off-by-one error into everything, and invites coders to make mistakes.
So both GNU and Microsoft added an extension to the language that allows you to declare an array of length zero:
/* Variation 2 */
struct mydata {
int type;
int datalen;
char data[0];
};
While on the surface, this is nonsensical, it dovetails neatly with the idiom used here. Now we can simply do:
rval = malloc(sizeof(*rval) + len);
and the code is much cleaner.
C99 formalized this idiom by acknowledging that the length of the array is a lie, but the ability to have extra data at the end of the structure comes in very handy. So now you declare:
/* Variation C99 */
struct mydata {
int type;
int datalen;
char data[];
};
and everything is coded up exactly as with the Gnu/Microsoft extension.
Unfortunately, it seems that Microsoft hasn't adopted C99 standards into their compiler, so no matter what you do, variations 2 and C99 will generate a warning. It looks like my only choices are to either live with the warning message or add a pragma to suppress it.
Linux users may amuse themselves by executing gr -r '\[0\]' /usr/include and seeing how many places use zero-length arrays. This is a very commonly used idiom.
As to my own issue: the struct I'm working with is actually part of an ioctl. The driver has already been written and I can't change it. The very most I can do is redefine the array from zero-length to flexible. Unfortunately, neither option makes the MSVC compiler happy.
When you create an array of undefined length (in your case in a struct) it is unknown how much memory needs to be allocated for it. When using variable 'info' it is therefore unknown where in the cache memory of the computer the program is writing or reading. This can make the program crash. Normally when you want to use an array of undefined length a pointer is used. Memory can be allocated for this pointer when it is known how large the array should be.
The compiler warns you about the problem I described above but lets you execute the code (at least it does in Microsoft Visual studio 2013).
Hope this helps you!
Related
Just curious, what actually happens if I define a zero-length array int array[0]; in code? GCC doesn't complain at all.
Sample Program
#include <stdio.h>
int main() {
int arr[0];
return 0;
}
Clarification
I'm actually trying to figure out if zero-length arrays initialised this way, instead of being pointed at like the variable length in Darhazer's comments, are optimised out or not.
This is because I have to release some code out into the wild, so I'm trying to figure out if I have to handle cases where the SIZE is defined as 0, which happens in some code with a statically defined int array[SIZE];
I was actually surprised that GCC does not complain, which led to my question. From the answers I've received, I believe the lack of a warning is largely due to supporting old code which has not been updated with the new [] syntax.
Because I was mainly wondering about the error, I am tagging Lundin's answer as correct (Nawaz's was first, but it wasn't as complete) -- the others were pointing out its actual use for tail-padded structures, while relevant, isn't exactly what I was looking for.
An array cannot have zero size.
ISO 9899:2011 6.7.6.2:
If the expression is a constant expression, it shall have a value greater than zero.
The above text is true both for a plain array (paragraph 1). For a VLA (variable length array), the behavior is undefined if the expression's value is less than or equal to zero (paragraph 5). This is normative text in the C standard. A compiler is not allowed to implement it differently.
gcc -std=c99 -pedantic gives a warning for the non-VLA case.
As per the standard, it is not allowed.
However it's been current practice in C compilers to treat those declarations as a flexible array member (FAM) declaration:
C99 6.7.2.1, §16: As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member.
The standard syntax of a FAM is:
struct Array {
size_t size;
int content[];
};
The idea is that you would then allocate it so:
void foo(size_t x) {
Array* array = malloc(sizeof(size_t) + x * sizeof(int));
array->size = x;
for (size_t i = 0; i != x; ++i) {
array->content[i] = 0;
}
}
You might also use it statically (gcc extension):
Array a = { 3, { 1, 2, 3 } };
This is also known as tail-padded structures (this term predates the publication of the C99 Standard) or struct hack (thanks to Joe Wreschnig for pointing it out).
However this syntax was standardized (and the effects guaranteed) only lately in C99. Before a constant size was necessary.
1 was the portable way to go, though it was rather strange.
0 was better at indicating intent, but not legal as far as the Standard was concerned and supported as an extension by some compilers (including gcc).
The tail padding practice, however, relies on the fact that storage is available (careful malloc) so is not suited to stack usage in general.
In Standard C and C++, zero-size array is not allowed..
If you're using GCC, compile it with -pedantic option. It will give warning, saying:
zero.c:3:6: warning: ISO C forbids zero-size array 'a' [-pedantic]
In case of C++, it gives similar warning.
It's totally illegal, and always has been, but a lot of compilers
neglect to signal the error. I'm not sure why you want to do this.
The one use I know of is to trigger a compile time error from a boolean:
char someCondition[ condition ];
If condition is a false, then I get a compile time error. Because
compilers do allow this, however, I've taken to using:
char someCondition[ 2 * condition - 1 ];
This gives a size of either 1 or -1, and I've never found a compiler
which would accept a size of -1.
Another use of zero-length arrays is for making variable-length object (pre-C99). Zero-length arrays are different from flexible arrays which have [] without 0.
Quoted from gcc doc:
Zero-length arrays are allowed in GNU C. They are very useful as the last element of a structure that is really a header for a variable-length object:
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied.
A real-world example is zero-length arrays of struct kdbus_item in kdbus.h (a Linux kernel module).
I'll add that there is a whole page of the online documentation of gcc on this argument.
Some quotes:
Zero-length arrays are allowed in GNU C.
In ISO C90, you would have to give contents a length of 1
and
GCC versions before 3.0 allowed zero-length arrays to be statically initialized, as if they were flexible arrays. In addition to those cases that were useful, it also allowed initializations in situations that would corrupt later data
so you could
int arr[0] = { 1 };
and boom :-)
Zero-size array declarations within structs would be useful if they were allowed, and if the semantics were such that (1) they would force alignment but otherwise not allocate any space, and (2) indexing the array would be considered defined behavior in the case where the resulting pointer would be within the same block of memory as the struct. Such behavior was never permitted by any C standard, but some older compilers allowed it before it became standard for compilers to allow incomplete array declarations with empty brackets.
The struct hack, as commonly implemented using an array of size 1, is dodgy and I don't think there's any requirement that compilers refrain from breaking it. For example, I would expect that if a compiler sees int a[1], it would be within its rights to regard a[i] as a[0]. If someone tries to work around the alignment issues of the struct hack via something like
typedef struct {
uint32_t size;
uint8_t data[4]; // Use four, to avoid having padding throw off the size of the struct
}
a compiler might get clever and assume the array size really is four:
; As written
foo = myStruct->data[i];
; As interpreted (assuming little-endian hardware)
foo = ((*(uint32_t*)myStruct->data) >> (i << 3)) & 0xFF;
Such an optimization might be reasonable, especially if myStruct->data could be loaded into a register in the same operation as myStruct->size. I know nothing in the standard that would forbid such optimization, though of course it would break any code which might expect to access stuff beyond the fourth element.
Definitely you can't have zero sized arrays by standard, but actually every most popular compiler gives you to do that. So I will try to explain why it can be bad
#include <cstdio>
int main() {
struct A {
A() {
printf("A()\n");
}
~A() {
printf("~A()\n");
}
int empty[0];
};
A vals[3];
}
I am like a human would expect such output:
A()
A()
A()
~A()
~A()
~A()
Clang prints this:
A()
~A()
GCC prints this:
A()
A()
A()
It is totally strange, so it is a good reason not to use empty arrays in C++ if you can.
Also there is extension in GNU C, which gives you to create zero length array in C, but as I understand it right, there should be at least one member in structure prior, or you will get very strange examples as above if you use C++.
I thought I knew C pretty well, but I'm confused by the following code:
typedef struct {
int type;
} cmd_t;
typedef struct {
int size;
char data[];
} pkt_t;
int func(pkt_t *b)
{
int *typep;
char *ptr;
/* #1: Generates warning */
typep = &((cmd_t*)(&(b->data[0])))->type;
/* #2: Doesn't generate warning */
ptr = &b->data[0];
typep = &((cmd_t*)ptr)->type;
return *typep;
}
When I compile with GCC, I get the "dereferencing type-punned pointer will break strict-aliasing rules" warning.
Why am I getting this warning at all? I'm dereferencing at char array. Casting a char * to anything is legal. Is this one of those cases where an array is not exactly the same as a pointer?
Why aren't both assignments generating the warning? The 2nd assignment is the equivalent of the first, isn't it?
When strict aliasing is turned on, the compiler is allowed to assume that two pointers of different type (char* vs cmt_t* in this instance) will not point to the same memory location. This allows for a greater range of optimizations which you would otherwise not want to be applied if they do indeed point to the same memory location. Various examples/horror-stories can be found in this question.
This is why, under strict-aliasing, you have to be careful how you do type punning. I believe that the standard doesn't allow for any type-puning what-so-ever (don't quote me on that) but most compilers have exemption for unions (my google-fu is failing in turning up the relevant manual pages):
union float_to_int {
double d;
uint64_t i;
};
union float_to_int ftoi;
ftoi.d = 1.0;
... = ftoi.i;
Unfortunately, this doesn't quite work for your situation as you would have to memcpy the content of the array into the union which is less then ideal. A simpler approach would be to simply to turn off strict-aliasing via the -fno-strict-aliasing switch. This will ensure that your code is correct and it's unlikely to have a significant performance impact (do measure if performance matters).
As for why the warning doesn't show up when the line is broken up, I don't know. Chances are that the modifications to the source code manages to confuse the compiler's static analysis pass enough that it doesn't see the type-punning. Note that the static analysis pass responsible for detecting type-punning is unrelated and doesn't talk to the various optimization passes that assume strict-aliasing. You can think of any static analysis done by compilers (unless otherwise specified) as a best-effort type of thing. In other words, the absence of warning doesn't mean that there are no errors which means that simply breaking up the line doesn't magically make your type-punning safe.
I wrote a dynamic array like this:
#include <stdlib.h>
typedef struct {
size_t capacity;
size_t len;
} __dynarray_header;
void* dynarray_new() {
__dynarray_header* header = malloc(sizeof(__dynarray_header));
header->capacity = 0;
header->len = 0;
return header + 1;
}
The dynamic array can be accessed with a [] operation. When resizing, I can use __dynarray_header*)array - 1 to retrieve capacity and length information.
The idea works in small tests. However, GCC warns about breaking strict-aliasing.
I've also found some larger projects segfault without the -fno-strict-aliasing compiler option (with -O3 optimization).
I know what strict-aliasing is, and why my code breaks strict-aliasing.
My question is: Is there a better way to implement a dynamic array supporting both the [] operation and dynamic resizing than the one I showed above?
Extra:
A demo program using this dynamic array:
int* arr = dynarray_new();
arr = dynarray_resize(sizeof(int) * 2);
arr[0] = 1;
arr[1] = 2;
arr = dynarray_resize(sizeof(int) * 4);
arr[2] = 3;
arr[3] = 4;
dynarray_free(arr);
The technique that the C standard foresees for such a thing are flexible arrays, as was already mentionned:
typedef struct {
size_t capacity;
size_t len;
unsigned char data[];
} dynarray_header;
If you allocate (or re-allocate) such a struct with enough space you may access the data element like any unsigned char array. char types may alias any other data type, so you wouldn't have problems with that.
If your compiler doesn't support flexible arrays, just put a [1] in there for data.
BTW, names starting with underscores are reserved in file scope, you are not supposed to use these.
The main optimization afforded by -fstrict-aliasing is that references to foo * can be arbitrarily moved past references to bar * in most circumstances. The segfaults you see are likely due to a reference getting moved past a free type operation somewhere.
While this feels a little dirty, you may be able to make it work under C89 by adding a union of prospective array element types into your structure, such as:
typedef struct {
size_t capacity;
size_t len;
union {
int i;
double d;
my_type mt;
etc e;
/* add additional types here. */
} array_head;
} __dynarray_header;
Then, instead of returning header + 1, return (void *)&(header->array_head).
Now, even with strict aliasing, the compiler is more likely to consider a pointer to __dynarray_header to possibly alias a pointer to anything in that union, unless the pointers are also restrict-qualified. (I'm assuming for your use case, they are not, at least in the contexts that trigger seg-faults.)
Still... as Dennis Ritchie said, it seems like "unwarranted chumminess with the implementation." Or, in other words, a hack. Good luck!
(Edit: As Carl above reminded me, in C99 you can use flexible array members. I haven't used them, simply because C99 support doesn't seem to be the default in the C compilers I use. Here's IBM's reference: http://pic.dhe.ibm.com/infocenter/iseries/v7r1m0/index.jsp?topic=%2Frzarg%2Fflexible.htm )
I have two structures, with values that should compute a pondered average, like this simplified version:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} quantities;
And then I use them to calculate:
average = v_move*qtt_move + v_read*qtt_read + v_suck*qtt_suck + v_flush*qtd_flush + v_nop*qtd_nop + v_call*qtt_call;
Every now and them I need to include another variable. Now, for instance, I need to include v_clean and qtt_clean. I can't change the structures to arrays:
typedef struct
{
int v[6];
} values;
typedef struct
{
int qtt[6];
} quantities;
That would simplify a lot my work, but they are part of an API that need the variable names to be clear.
So, I'm looking for a way to access the members of that structures, maybe using sizeof(), so I can treat them as an array, but still keep the API unchangeable. It is guaranteed that all values are int, but I can't guarantee the size of an int.
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Thanks,
Beco
What you are trying to do is not possible to do in any elegant way. It is not possible to reliably access consecutive struct members as an array. The currently accepted answer is a hack, not a solution.
The proper solution would be to switch to an array, regardless of how much work it is going to require. If you use enum constants for array indexing (as #digEmAll suggested in his now-deleted answer), the names and the code will be as clear as what you have now.
If you still don't want to or can't switch to an array, the only more-or-less acceptable way to do what you are trying to do is to create an "index-array" or "map-array" (see below). C++ has a dedicated language feature that helps one to implement it elegantly - pointers-to-members. In C you are forced to emulate that C++ feature using offsetof macro
static const size_t values_offsets[] = {
offsetof(values, v_move),
offsetof(values, v_read),
offsetof(values, v_suck),
/* and so on */
};
static const size_t quantities_offsets[] = {
offsetof(quantities, qtt_move),
offsetof(quantities, qtt_read),
offsetof(quantities, qtt_suck),
/* and so on */
};
And if now you are given
values v;
quantities q;
and index
int i;
you can generate the pointers to individual fields as
int *pvalue = (int *) ((char *) &v + values_offsets[i]);
int *pquantity = (int *) ((char *) &q + quantities_offsets[i]);
*pvalue += *pquantity;
Of course, you can now iterate over i in any way you want. This is also far from being elegant, but at least it bears some degree of reliability and validity, as opposed to any ugly hack. The whole thing can be made to look more elegantly by wrapping the repetitive pieces into appropriately named functions/macros.
If all members a guaranteed to be of type int you can use a pointer to int and increment it:
int *value = &(values.v_move);
int *quantity = &(quantities.qtt_move);
int i;
average = 0;
// although it should work, a good practice many times IMHO is to add a null as the last member in struct and change the condition to quantity[i] != null.
for (i = 0; i < sizeof(quantities) / sizeof(*quantity); i++)
average += values[i] * quantity[i];
(Since the order of members in a struct is guaranteed to be as declared)
Writing the question came to my mind... Can a union do the job? Is there another clever way to automatize the task of adding another member?
Yes, a union can certainly do the job:
union
{
values v; /* As defined by OP */
int array[6];
} u;
You can use a pointer to u.values in your API, and work with u.array in your code.
Personally, I think that all the other answers break the rule of least surprise. When I see a plain struct definition, I assume that the structure will be access using normal access methods. With a union, it's clear that the application will access it in special ways, which prompts me to pay extra attention to the code.
It really sounds as if this should have been an array since the beggining, with accessor methods or macros enabling you to still use pretty names like move, read, etc. However, as you mentioned, this isn't feasible due to API breakage.
The two solutions that come to my mind are:
Use a compiler specific directive to ensure that your struct is packed (and thus, that casting it to an array is safe)
Evil macro black magic.
How about using __attribute__((packed)) if you are using gcc?
So you could declare your structures as:
typedef struct
{
int v_move, v_read, v_suck, v_flush, v_nop, v_call;
} __attribute__((packed)) values;
typedef struct
{
int qtt_move, qtt_read, qtt_suck, qtd_flush, qtd_nop, qtt_call;
} __attribute__((packed)) quantities;
According to the gcc manual, your structures will then use the minimum amount of memory possible for storing the structure, omitting any padding that might have normally been there. The only issue would then be to determine the sizeof(int) on your platform which could be done through either some compiler macros or using <stdint.h>.
One more thing is that there will be a performance penalty for unpacking and re-packing the structure when it needs to be accessed and then stored back into memory. But at least you can be assured then that the layout is consistent, and it could be accessed like an array using a cast to a pointer type like you were wanting (i.e., you won't have to worry about padding messing up the pointer offsets).
Thanks,
Jason
this problem is common, and has been solved in many ways in the past. None of them is completely safe or clean. It depends on your particuar application. Here's a list of possible solutions:
1) You can redefine your structures so fields become array elements, and use macros to map each particular element as if it was a structure field. E.g:
struct values { varray[6]; };
#define v_read varray[1]
The disadvantage of this approach is that most debuggers don't understand macros. Another problem is that in theory a compiler could choose a different alignment for the original structure and the redefined one, so the binary compatibility is not guaranted.
2) Count on the compiler's behaviour and treat all the fields as it they were array fields (oops, while I was writing this, someone else wrote the same - +1 for him)
3) create a static array of element offsets (initialized at startup) and use them to "map" the elements. It's quite tricky, and not so fast, but has the advantage that it's independent of the actual disposition of the field in the structure. Example (incomplete, just for clarification):
int positions[10];
position[0] = ((char *)(&((values*)NULL)->v_move)-(char *)NULL);
position[1] = ((char *)(&((values*)NULL)->v_read)-(char *)NULL);
//...
values *v = ...;
int vread;
vread = *(int *)(((char *)v)+position[1]);
Ok, not at all simple. Macros like "offsetof" may help in this case.
The C99 standard allows the creation of flexible array members such as
typedef struct pstring {
size_t length;
char string[];
} pstring;
This is then initialized with something like pstring* s = malloc(sizeof(pstring) + len). Is it permissible for len to be zero? It would seem consistent, and would be nice for saving space from time to time (probably not with the pstring example, of course). On the other hand, I have no idea what the following code would do:
pstring* s = malloc(sizeof(pstring));
s->string;
This also seems like the sort of thing which might work with one compiler and not another, or on one OS and not another, or on one day and not another, so what I really want to know is what the standard says about this. Is that malloc in the example code undefined behavior, or is it only the access to s->string which is invalid, or is it something else entirely?
What you do is valid, but accessing s->string[0] or feeding s->string to any function accessing the data is invalid.
The C99 Standard actually says (§6.7.2.1):
struct s { int n; double d[]; };
...
struct s t1 = { 0 }; // valid
...
The assignment to t1.d[0] is probably undefined behavior, but it is possible that
sizeof (struct s) >= offsetof(struct s, d) + sizeof (double)
in which case the assignment would be legitimate. Nevertheless, it cannot appear in strictly conforming
code.
At one time, Microsoft C allowed the following:
struct pstring {
size_t length;
char string[0];
};
And I used something like this to build a text editor. However, this was a long time ago and I know it was non-standard at the time. I'm not even 100% certain Microsoft still supports it.
The bottom line is that it depends on your compiler, and the current compile settings.