struct xyz a[0]; What does this mean? [duplicate] - c

I am working on refactoring some old code and have found few structs containing zero length arrays (below). Warnings depressed by pragma, of course, but I've failed to create by "new" structures containing such structures (error 2233). Array 'byData' used as pointer, but why not to use pointer instead? or array of length 1? And of course, no comments were added to make me enjoy the process...
Any causes to use such thing? Any advice in refactoring those?
struct someData
{
int nData;
BYTE byData[0];
}
NB It's C++, Windows XP, VS 2003

Yes this is a C-Hack.
To create an array of any length:
struct someData* mallocSomeData(int size)
{
struct someData* result = (struct someData*)malloc(sizeof(struct someData) + size * sizeof(BYTE));
if (result)
{ result->nData = size;
}
return result;
}
Now you have an object of someData with an array of a specified length.

There are, unfortunately, several reasons why you would declare a zero length array at the end of a structure. It essentially gives you the ability to have a variable length structure returned from an API.
Raymond Chen did an excellent blog post on the subject. I suggest you take a look at this post because it likely contains the answer you want.
Note in his post, it deals with arrays of size 1 instead of 0. This is the case because zero length arrays are a more recent entry into the standards. His post should still apply to your problem.
http://blogs.msdn.com/oldnewthing/archive/2004/08/26/220873.aspx
EDIT
Note: Even though Raymond's post says 0 length arrays are legal in C99 they are in fact still not legal in C99. Instead of a 0 length array here you should be using a length 1 array

This is an old C hack to allow a flexible sized arrays.
In C99 standard this is not neccessary as it supports the arr[] syntax.

Your intution about "why not use an array of size 1" is spot on.
The code is doing the "C struct hack" wrong, because declarations of zero length arrays are a constraint violation. This means that a compiler can reject your hack right off the bat at compile time with a diagnostic message that stops the translation.
If we want to perpetrate a hack, we must sneak it past the compiler.
The right way to do the "C struct hack" (which is compatible with C dialects going back to 1989 ANSI C, and probably much earlier) is to use a perfectly valid array of size 1:
struct someData
{
int nData;
unsigned char byData[1];
}
Moreover, instead of sizeof struct someData, the size of the part before byData is calculated using:
offsetof(struct someData, byData);
To allocate a struct someData with space for 42 bytes in byData, we would then use:
struct someData *psd = (struct someData *) malloc(offsetof(struct someData, byData) + 42);
Note that this offsetof calculation is in fact the correct calculation even in the case of the array size being zero. You see, sizeof the whole structure can include padding. For instance, if we have something like this:
struct hack {
unsigned long ul;
char c;
char foo[0]; /* assuming our compiler accepts this nonsense */
};
The size of struct hack is quite possibly padded for alignment because of the ul member. If unsigned long is four bytes wide, then quite possibly sizeof (struct hack) is 8, whereas offsetof(struct hack, foo) is almost certainly 5. The offsetof method is the way to get the accurate size of the preceding part of the struct just before the array.
So that would be the way to refactor the code: make it conform to the classic, highly portable struct hack.
Why not use a pointer? Because a pointer occupies extra space and has to be initialized.
There are other good reasons not to use a pointer, namely that a pointer requires an address space in order to be meaningful. The struct hack is externalizeable: that is to say, there are situations in which such a layout conforms to external storage such as areas of files, packets or shared memory, in which you do not want pointers because they are not meaningful.
Several years ago, I used the struct hack in a shared memory message passing interface between kernel and user space. I didn't want pointers there, because they would have been meaningful only to the original address space of the process generating a message. The kernel part of the software had a view to the memory using its own mapping at a different address, and so everything was based on offset calculations.

It's worth pointing out IMO the best way to do the size calculation, which is used in the Raymond Chen article linked above.
struct foo
{
size_t count;
int data[1];
}
size_t foo_size_from_count(size_t count)
{
return offsetof(foo, data[count]);
}
The offset of the first entry off the end of desired allocation, is also the size of the desired allocation. IMO it's an extremely elegant way of doing the size calculation. It does not matter what the element type of the variable size array is. The offsetof (or FIELD_OFFSET or UFIELD_OFFSET in Windows) is always written the same way. No sizeof() expressions to accidentally mess up.

Related

Why is the offsetof macro necessary?

I am new to the C language and just learned about structs and pointers.
My question is related to the offsetof macro I recently saw. I know how it works and the logic behind that.
In the <stddef.h> file the definition is as follows:
#define offsetof(type,member) ((unsigned long) &(((type*)0)->member))
My question is, if I have a struct as shown below:
struct test {
int field1:
int field2:
};
struct test var;
Why cannot I directly get the address of field2 as:
char * p = (char *)&var;
char *addressofField2 = p + sizeof(int);
Rather than writing something like this
field2Offset = offsetof (struct test, field2);
and then adding offset value to var's starting address?
Is there any difference? Is using offsetof more efficient?
The C compiler will often add extra padding bits or bytes between members of a struct in order to improve efficiency and keep integers word-aligned (which in some architectures is required to avoid bus errors and in some architectures is required to avoid efficiency problems). For example, in many compilers, if you have this struct:
struct ImLikelyPadded {
int x;
char y;
int z;
};
you might find that sizeof(struct ImLikelyPadded) is 12, not 9, because the compiler will insert three extra padding bytes between the end of the one-byte char y and the word-sized int z. This is why offsetof is so useful - it lets you determine where things really are even factoring in padding bytes and is highly portable.
Unlike arrays, memory layout of struct is not always contiguous. Compiler may add extra bytes, in order to align the memory. This is called padding.
Because of padding, it us difficult to find location of member manually. This is also why we always use sizeof to find struct size.
Offsetof , macro let you find out the distance,offset, of a member of struct from the strating position of the struct.
One intelligent use if offsetof is seen in Linux kernel's container_of macro. This macro let you find out starting position of node given the address of member in a generic inclusive doubly linked list
As already mentioned in the other answers, padding is one of the reasons. I won't repeat what was already said about it.
Another good reason to use the offsetof macro and not manually compute offsets is that you only have to write it once. Imagine what happens if you need to change the type of field1 or insert or remove one or more fields in front of field2. Using your hand-crafted calculation you have to find and change all its occurrences. Missing one of them will produce mysterious bugs that are difficult to find.
The code written using offsetof doesn't need any update in this situation. The compiler takes care of everything on the next compilation.
Even more, the code that uses offsetof is more clear. The macro is standard, its functionality is documented. A fellow programmer that reads the code understands it immediately. It's not that easy to understand what the hand-crafted code attempts.

Why does GCC allow zero length array only as last member?

According to this,
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
It is said that the benefit is
They are very useful as the last element of a structure that is really
a header for a variable-length object
What does it mean?
The zero-length array is a GCC extension (read as: not standard) which you should not use.
While recent versions of C allow for someting similar (flexible array member with empty brackets), C++ knows no such thing. As people often mix C and C++, this is a possible source of confusion.
Instead, an array of length 1 should be used, which is standards-compliant under both C and C++, and which just works with every compiler.
What is this useful for at all?
Sometimes you need to access "invalid" out-of-bounds data knowing that it is valid in reality. In the strictest sense, this is undefined behavior (since you are accessing out-of-bounds values which are indeterminate, and using indeterminate values is UB), but that is only for what the compiler knows, not for what it fact, so it nevertheless "works fine".
For example, you might receive framed data on the network consisting of a tag word, a length, and an amount of data corresponding to the length given. Or an operating system function might return a variable amount of results to you (a couple of Win32 API functions work that way, for example).
In either case, you have a unknown (unknown at compile time) number of elements at the end of this structure, so it is not possible to define a single legitimate structure to hold everything.
That is what flexible array members are for. And with this, it is explained why they must be the last member as well. It doesn't make sense for something that could have "any size" to be anywhere but at the end -- it's impossible for the compiler to lay out any members after it, not knowing its size.
(In case you wonder how the compiler can ever free the storage not knowing the objects's size... it cannot! There normally exists an explicit function for freeing such an object as part of the API, which takes care of this exact problem.)
It's probably best to demonstrate with a small example:
#include <stdio.h>
#include <stdlib.h>
#define BLOB_TYPE_FOO 0xBEEF
struct blob {
/* Part of your object header... perhaps describing the type of blob. */
int type;
/* This is actually the length of the "data" field below */
unsigned length;
/* The data */
unsigned char data[];
};
struct blob *
create_blob(int type, size_t size)
{
/* Allocate enough space for the "header" and "size" bytes of data. */
struct blob *x = calloc(1, sizeof(struct blob) + size);
x->type = type;
x->length = size;
return x;
}
int
main(void)
{
/* Note that sizeof(struct blob) doesn't include the data field. */
printf("sizeof(struct blob): %zu\n", sizeof(struct blob));
struct blob *x = create_blob(BLOB_TYPE_FOO, 1000);
/*
You can manipulate data here, but be careful not to exceed the
allocated size.
*/
size_t i;
for (i = 0; i < 1000; i++)
{
x->data[i] = 'A' + (i % 26);
}
/*
Since data was allocated with the rest of the header, everything is
freed.
*/
free(x);
return 0;
}
The nice part about this setup is that sizeof(struct blob) represents the size of the "object header" (on my machine, that's 8 bytes), and that since you allocate the whole object together, a single free() is all that is needed to release the memory.
Like others have stated here, this is a non-standard extension and you should really consider using it with care. Damon's answer is the better way to go, though the sizeof() operation is not quite the right size (it's a bit too large to represent the size of the actual header). It's not too hard to workaround that problem though.
You cannnot have the array of 0 length because if you try to make a zero length array then it would mean that you are trying to create a pointer to nothing which is not correct. The C standard says:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.

What are the real benefits of flexible array member?

After reading some posts related to flexible array member, I am still not fully understand why we need such a feature.
Possible Duplicate:
Flexible array members in C - bad?
Is this a Flexible Array Struct Members in C as well?
(Blame me if I didn't solve my problem from the possible duplicate questions above)
What is the real difference between the following two implementations:
struct h1 {
size_t len;
unsigned char *data;
};
struct h2 {
size_t len;
unsigned char data[];
};
I know the size of h2 is as if the flexible array member (data) were omitted, that is, sizeof(h2) == sizeof(size_t). And I also know that the flexible array member can only appear as the last element of a structure, so the original implementation can be more flexible in the position of data.
My real problem is that why C99 add this feature? Simply because sizeof(h2) doesn't contain the real size of data? I am sure that I must miss some more important points for this feature. Please point it out for me.
The two structs in your post don't have the same structure at all. h1 has a integer and a pointer to char. h2 has an integer, and an array of characters inline (number of elements determined at runtime, possibly none).
Said differently, in h2 the character data is inside the struct. In h1 it has to be somewhere outside.
This makes a lot of difference. For instance, if you use h1 you need to take care of allocating/freeing the payload (in addition to the struct itself). With h2, only one allocation/free is necessary, everything is packaged together.
One case where using h2 might make sense is if you're communicating with something that expects messages in the form of {length,data} pairs. You allocate an instance of h2 by requesting sizeof(h2)+how many payload chars you want, fill it up, and then you can transfer the whole thing in a single write (taking care about endianess and such of course). If you had used h1, you'd need two write calls (unless you want to send the memory address of the data, which usually doesn't make any sense).
So this feature exists because it's handy. And various (sometimes non-portable) tricks where used before that to simulate this feature. Adding it to the standard makes sense.
The main reason the Committee introduced flexible array members is to implement the famous struct hack. See the below quote from the C99 Rationale, especially the part I add the emphasis.
Rationale for International Standard — Programming Languages — C §6.7.2.1 Structure and union specifiers
There is a common idiom known as the “struct hack” for creating a structure containing a variable-size array:
struct s
{
int n_items;
/* possibly other fields */
int items[1];
};
struct s *p;
size_t n, i;
/* code that sets n omitted */
p = malloc(sizeof(struct s) + (n - 1) * sizeof(int));
/* code to check for failure omitted */
p->n_items = n;
/* example usage */
for (i = 0; i < p->n_items; i++)
p->items[i] = i;
The validity of this construct has always been questionable. In the response to one Defect
Report, the Committee decided that it was undefined behavior because the array p->items
contains only one item, irrespective of whether the space exists. An alternative construct was suggested: make the array size larger than the largest possible case (for example, using int items[INT_MAX];), but this approach is also undefined for other reasons.
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced. Apart from the empty brackets, and the removal of the “-1” in the malloc call, this is used in the same way as the struct hack, but is now explicitly valid code.
There are a few restrictions on flexible array members that ensure that code using them makes sense. For example, there must be at least one other member, and the flexible array must occur last. Similarly, structures containing flexible arrays can't occur in other structures or in arrays. Finally, sizeof applied to the structure ignores the array but counts any padding before it. This makes the malloc call as simple as possible.
I don't know if this is considered as an important point, but GCC docs points this out:
GCC allows static initialization of flexible array members. This is equivalent to defining a new structure containing the original structure followed by an array of sufficient size to contain the data. E.g. in the following, f1 is constructed as if it were declared like f2.
struct f1 {
int x; int y[];
} f1 = { 1, { 2, 3, 4 } };
struct f2 {
struct f1 f1; int data[3];
} f2 = { { 1 }, { 2, 3, 4 } };
(taken from http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html)

Accessing array as a struct *

This is one of those I think this should work, but it's best to check questions. It compiles and works fine on my machine.
Is this guaranteed to do what I expect (i.e. allow me to access the first few elements of the array with a guarantee that the layout, alignment, padding etc of the struct is the same as the array)?
struct thingStruct
{
int a;
int b;
int c;
};
void f()
{
int thingsArray[5];
struct thingStruct *thingsStruct = (struct thingStruct *)&thingsArray[0];
thingsArray[0] = 100;
thingsArray[1] = 200;
thingsArray[2] = 300;
printf("%d", thingsStruct->a);
printf("%d", thingsStruct->b);
printf("%d", thingsStruct->c);
}
EDIT: Why on earth would I want to do something like this? I have an array which I'm mmapping to a file. I'm treating the first part of the array as a 'header', which stores various pieces of information about the array, and the rest of it I'm treating as a normal array. If I point the struct to the start of the array I can access the pieces of header data as struct members, which is more readable. All the members in the struct would be of the same type as the array.
While I have seen this done frequently, you cannot (meaning it is not legal, standard C) make assumptions about the binary layout of a structure, as it may have padding between fields.
This is explained in the comp.lang.c faq: http://c-faq.com/struct/padding.htmls
Although it's likely to work in most places, it's still a bit iffy. If you want to give symbolic names to parts of the header, why not just do:
enum { HEADER_A, HEADER_B, HEADER_C };
/* ... */.
printf("%d", thingsArray[HEADER_A]);
printf("%d", thingsArray[HEADER_B]);
printf("%d", thingsArray[HEADER_C]);
As Evan commented on the question, this will probably work in most cases (again, probably best if you use #pragma pack to ensure their is no padding) assuming all the types in your struct are the same type as your array. Given the rules of C, this is legal.
My question to you is "why?" This isn't a particularly safe thing to do. If a float gets thrown into the middle of the struct, this all falls apart. Why not just use the struct directly? This really ins't a technique that I'd recommend in most cases.
Another solution for representing a header and the rest of file data is using a structure like this:
struct header {
long headerData1;
int headerData2;
int headerData3;
int fileData[ 1 ]; // <- data begin here
};
Then you allocate the memory block with a file contents and cast it as struct header *myFileHeader (or map the memory block on a file) and access all your file data with
myFileHeader->fileData[ position ]
for arbitrary big position. The language imposes no restriction on the index value, so it's only your responsibility to keep your arbitrary big posistion within the actual size of the memory block you allocated (or the mapped file's size).
One more important note: apart from switching off the struct members padding, which has been already described by others, you should carefully choose data types for the header members, so that they fit the actual file data layout despite compiler you use (say, int won't change from 32 to 64 bits...)

Is a struct of pointers guaranteed to be represented without padding bits?

I have a linked list, which stores groups of settings for my application:
typedef struct settings {
struct settings* next;
char* name;
char* title;
char* desc;
char* bkfolder;
char* srclist;
char* arcall;
char* incfold;
} settings_row;
settings_row* first_profile = { 0 };
#define SETTINGS_PER_ROW 7
When I load values into this structure, I don't want to have to name all the elements. I would rather treat it like a named array -- the values are loaded in order from a file and placed incrementally into the struct. Then, when I need to use the values, I access them by name.
//putting values incrementally into the struct
void read_settings_file(settings_row* settings){
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
//accessing components by name
void settings_info(settings_row* settings){
printf("Settings 'profile': %s\n", settings.title);
printf("Description: %s\n", settings.desc);
printf("Folder to backup to: %s\n", settings.bkfolder);
}
But I wonder, since these are all pointers (and there will only ever be pointers in this struct), will the compiler add padding to any of these values? Are they guaranteed to be in this order, and have nothing between the values? Will my approach work sometimes, but fail intermittently?
edit for clarification
I realize that the compiler can pad any values of a struct--but given the nature of the struct (a struct of pointers) I thought this might not be a problem. Since the most efficient way for a 32 bit processor to address data is in 32 bit chunks, this is how the compiler pads values in a struct (ie. an int, short, int in a struct will add 2 bytes of padding after the short, to make it into a 32 bit chunk, and align the next int to the next 32 bit chunk). But since a 32 bit processor uses 32 bit addresses (and a 64 bit processor uses 64 bit addresses (I think)), would padding be totally unnecessary since all of the values of the struct (addresses, which are efficient by their very nature) are in ideal 32 bit chunks?
I am hoping some memory-representation / compiler-behavior guru can come shed some light on whether a compiler would ever have a reason to pad these values
Under POSIX rules, all pointers (both function pointers and data pointers) are all required to be the same size; under just ISO C, all data pointers are convertible to 'void *' and back without loss of information (but function pointers need not be convertible to 'void *' without loss of information, nor vice versa).
Therefore, if written correctly, your code would work. It isn't written quite correctly, though! Consider:
void read_settings_file(settings_row* settings)
{
char* field = settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
Let's assume you're using a 32-bit machine with 8-bit characters; the argument is not all that significantly different if you're using 64-bit machines. The assignment to 'field' is all wrong, because settings + 4 is a pointer to the 5th element (counting from 0) of an array of 'settings_row' structures. What you need to write is:
void read_settings_file(settings_row* settings)
{
char* field = (char *)settings + sizeof(void*);
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW)
;
}
The cast before addition is crucial!
C Standard (ISO/IEC 9899:1999):
6.3.2.3 Pointers
A pointer to void may be converted to or from a pointer to any incomplete or object
type. A pointer to any incomplete or object type may be converted to a pointer to void
and back again; the result shall compare equal to the original pointer.
[...]
A pointer to a function of one type may be converted to a pointer to a function of another
type and back again; the result shall compare equal to the original pointer. If a converted
pointer is used to call a function whose type is not compatible with the pointed-to type,
the behavior is undefined.
In many cases pointers are natural word sizes, so the compiler is unlikely to pad each member, but that doesn't make it a good idea. If you want to treat it like an array you should use an array.
I'm thinking out loud here so there's probably many mistakes but perhaps you could try this approach:
enum
{
kName = 0,
kTitle,
kDesc,
kBkFolder,
kSrcList,
kArcAll,
kIncFold,
kSettingsCount
};
typedef struct settings {
struct settings* next;
char *settingsdata[kSettingsCount];
} settings_row;
Set the data:
settings_row myRow;
myRow.settingsData[kName] = "Bob";
myRow.settingsData[kDescription] = "Hurrrrr";
...
Reading the data:
void read_settings_file(settings_row* settings){
char** field = settings->settingsData;
int i = 0;
while(read_value_into(field[i]) && i++ < SETTINGS_PER_ROW);
}
It's not guaranteed by the C standard. I've a sneaking suspicion, that I don't have time to check right now either way, that it guarantees no padding between the char* fields, i.e. that consecutive fields of the same type in a struct are guaranteed to be layout-compatible with an array of that type. But even if so, you're on your own between the settings* and the first char*, and also between the last char* and the end of the struct. But you could use offsetof to deal with the first issue, and I don't think the second affects your current code.
However, what you want is almost certainly guaranteed by your compiler, which somewhere in its documentation will set out its rules for struct layout, and will almost certainly say that all pointers to data are word sized, and that a struct can be the size of 8 words without additional padding. But if you want to write highly portable code, you have to use only the guarantees in the standard.
The order of fields is guaranteed. I also don't think you'll see intermittent failure - AFAIK the offset of each field in that struct will be consistent for a given implementation (meaning the combination of compiler and platform).
You could assert that sizeof(settings*) == sizeof(char*) and sizeof(settings_row) == sizeof(char*)*8. If both those hold, there is no room for any padding in the struct, since fields are not allowed to "overlap". If you ever hit a platform where they don't hold, you'll find out.
Even so, if you want an array, I'd be inclined to say use an array, with inline accessor functions or macros to get the individual fields. Whether your trick works or not, it's even easier not to think about it at all.
Although not a duplicate, this probably answers your question:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
It's not uncommon for applications to write an entire struct into a file and read it back out again. But this suffers from the possibility that one day the file will need to be read back on another platform, or by another version of the compiler that packs the struct differently. (Although this can be dealt with by specially-written code that understands the original packing format).
Technically, you can rely only on the order; the compiler could insert padding. If different pointers were of different size, or if the pointer size wasn't a natural word size, it might insert padding.
Practically speaking, you could get away with it. I wouldn't recommend it; it's a bad, dirty trick.
You could achieve your goal with another level of indirection (what doesn't that solve?), or by using a temporary array initialized to point to the various members of the structure.
It's not guaranteed, but it will work fine in most cases. It won't be intermittent, it will either work or not work on a particular platform with a particular build. Since you're using all pointers, most compilers won't mess with any padding.
Also, if you wanted to be safer, you could make it a union.
You can't do that the way you are trying. The compiler is allowed to pad any and all members of the struct. I do not believe it is allowed to reorder the fields.
Most compilers have an attribute that can be applied to the struct to pack it (ie to turn it into a collection of tightly packed storage with no padding), but the downside is that this generally affects performance. The packed flag will probably allow you to use the struct the way you want, but it may not be portable across various platforms.
Padding is designed to make field access as efficient as possible on the target architecture. It's best not to fight it unless you have to (ie, the struct goes to a disk or over a network.)
It seems to me that this approach creates more problems than it solves.
When you read this code six months from now, will you still be aware of all the subtleties of how the compiler pads a struct?
Would someone else, who didn't write the code?
If you must use the struct, use it in the canonical way and just write a function which
assigns values to each field separately.
You could also use an array and create macros to give field names to indices.
If you get too "clever" about optimizing your code, you will end up with slower code anyway, since the compiler won't be able to optimize it as well.

Resources