OK so I was reading the standard paper (ISO C11) in the part where it explains flexible array members (at 6.7.2.1 p18). It says this:
As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply. However, when a . (or
->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member,
it behaves as if that member were replaced with the longest array
(with the same element type) that would not make the structure larger
than the object being accessed; the offset of the array shall remain
that of the flexible array member, even if this would differ from that
of the replacement array. If this array would have no elements, it
behaves as if it had one element but the behavior is undefined if any
attempt is made to access that element or to generate a pointer one
past it.
And here are some of the examples given below (p20):
EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to
use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to
by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in
particular, the offsets of member d might not be the same).
Added spoilers as examples inside the standard are not documentation.
And now my example (extending the one from the standard):
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
struct s { int n; double d[]; };
int m = 7;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m])); //create our object
printf("%zu", sizeof(p->d)); //retrieve the size of the flexible array member
free(p); //free out object
}
Online example.
Now the compiler is complaining that p->d has incomplete type double[] which is clearly not the case according the standard paper. Is this a bug in the GCC compiler?
As a special case, the last element of a structure with more than one named member may have an incomplete array type; ... C11dr 6.7.2.1 18
In the following d is an incomplete type.
struct s { int n; double d[]; };
The sizeof operator shall not be applied to an expression that has function type or an incomplete type ... C11dr §6.5.3.4 1
// This does not change the type of field `m`.
// It (that is `d`) behaves like a `double d[m]`, but it is still an incomplete type.
struct s *p = foo();
// UB
printf("%zu", sizeof(p->d));
This looks like a defect in the Standard. We can see from the paper where flexible array members were standardized, N791 "Solving the struct hack problem", that the struct definition replacement is intended to apply only in evaluated context (to borrow the C++ terminology); my emphasis:
When an lvalue whose type is a structure
with a flexible array member is used to access an object, it behaves as
if that member were replaced by the longest array that would not make
the structure larger than the object being accessed.
Compare the eventual standard language:
[W]hen a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed [...]
Some form of language like "When a . (or ->) operator whose left operand is (a pointer to) a structure with a flexible array member and whose right operand names that member is evaluated [...]" would seem to work to fix it.
(Note that sizeof does not evaluate its argument, except for variable length arrays, which are another kettle of fish.)
There is no corresponding defect report visible via the JTC1/SC22/WG14 website. You might consider submitting a defect report via your ISO national member body, or asking your vendor to do so.
Standard says:
C11-§6.5.3.4/2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand.
and it also says
C11-§6.5.3.4/1
The sizeof operator shall not be applied to an expression that has function type or an incomplete type, [...]
p->d is of incomplete type and it can't be an operand of sizeof operator. The statement
it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed
doesn't hold for sizeof operator as it determine size of the object by the type of object which must be a complete type.
First, what is happening is correct in terms of the standard, arrays that are declared [] are incomplete and you can't use the sizeof operator.
But there is also a simple reason for it in your case. You never told your compiler that in that particular case the d member should be viewed as of a particular size. You only told malloc the total memory size to be reserved and placed p to point to that. The compiler has obtained no type information that could help him deduce the size of the array.
This is different from allocating a variable length array (VLA) or a pointer to VLA:
double (*q)[m] = malloc(sizeof(double[m]));
Here the compiler can know what type of array q is pointing to. But not because you told malloc the total size (that information is not returned from the malloc call) but because m is part of the type specification of q.
The C Standard is a bit loosey-goosey when it comes to the definition of certain terms in certain contexts. Given something like:
struct foo {uint32_t x; uint16_t y[]; };
char *p = 1024+(char*)malloc(1024); // Point to end of region
struct foo *q1 = (struct foo *)(p -= 512); // Allocate some space from it
... some code which uses *q1
struct foo *q2 = (struct foo *)(p -= 512); // Allocate more space from it
there's no really clear indication of what storage is occupied by objects
*q1 or *q2, nor by q1->y or q2->y. If *q1 will never be accessed afterward,
then q2->y may be treated as a uint16_t[509], but writing to *q1 will trash
the contents of q2->y[254] and above, and writing q2->y[254] and above will
trash *q1. Since a compiler will generally have no way of knowing what will
happen to *q1 in the future, it will have no way of sensibly reporting a size
for q2->y.
Related
The C standard states (emphasize mine):
21 EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. [...]
22 Following the above declaration:
struct s t1 = { 0 }; // valid
struct s t2 = { 1, { 4.2 }}; // invalid
t1.n = 4; // valid
t1.d[0] = 4.2; // might be undefined behavior
The initialization of t2 is invalid (and violates a constraint) because struct s is treated as if it did not contain member d.
Source: C18, §6.7.2.1/20 + /21
I do not understand the explanation of "because struct s is treated as if it did not contain member d"
If I use the initializer of { 1, { 4.2 }};, the { 4.2 } part is to initialize the flexible array member;
To be precise to initialize the flexible array member to be consisted of one element and initialize this element to the value 4.2 and thus stuct s is treated as it has member d or not?
This sentence makes no sense in my eyes.
Why does the standard say, that { 4.2 } wouldn't initialize/denote the flexible array member and thus the structure would be treated as if it has no member d?
If I use a fixed size array, this notation works and initializes the member with no complain:
struct foo {
int x;
double y[1];
};
int main (void)
{
struct foo a = { 1, { 2.3 } };
}
Evidence
Why is this initialization invalid when the structure has an flexible array member but valid when the structure has an fixed size array member?
Could you elaborate that?
I've read:
Why does static initialization of flexible array member work?
and
How to initialize a structure with flexible array member
and
Flexible array members can lead to undefined behavior?
and others but none of them answers me what this sentence wants to explain and why exactly this this is invalid.
Related:
How does an array of structures with flexible array members behave?
What are the real benefits of flexible array member?
I guess this is a language defect. While it might make no sense to initialize a flexible array member, the standard needs to address that issue somewhere. I can't find such normative text anywhere.
The definition of a flexible array member is, C17 6.7.2.1/18:
As a special case, the last element of a structure with more than one named member may have an
incomplete array type; this is called a flexible array member. In most situations, the flexible array
member is ignored. In particular, the size of the structure is as if the flexible array member were
omitted except that it may have more trailing padding than the omission would imply.
From this we learn that a flexible array member is an incomplete array type. We do not however learn in what situations the flexible array member is ignored, save for when calculating the size of the struct. "In most situations" isn't helpful and is the defect - this needed to be expanded to an exhaustive list, including the behavior of flexible array members when part of an initializer list. Otherwise one may assume that it behaves just like any other array of incomplete type.
C17 6.2.5/22:
An array type of unknown size is an incomplete type.
And then the rules for initialization say, C17 6.7.9:
The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type.
So far there is no normative text saying that we are not allowed to provide an initializer for a flexible array member - on the contrary. The example in the question (C17 6.7.2.1 example 21) is not normative, since examples aren't normative in ISO standards. The example doesn't mention which constraint that is violated, nor does it mention where it says that the flexible array member must be ignored.
I suppose I'd probably file a DR about this.
I do not understand the explanation of "because struct s is treated as if it did not contain member d".
The C standard also says “In most situations, the flexible array member is ignored.” It is unclear why you would not understand what the meaning of this is. If struct s is declared struct s { int n; double d[]; };, then, in most situations, the C implementation behaves as if it were declared struct s { int n; };. Therefore, struct s t2 = { 1, { 4.2 }}; fails because the 4.2 is an initializer for something that, in effect, does not exist.
It is sensible to ask why this is the situation. For the most part, I expect a compiler could support a definition in which the array initializers were counted and used to set the structure size. Certainly compilers do this with array definitions such s int a[] = { 3, 4, 5};. However, that is not the typical use case for flexible array members. Typically, a program receives information about how many elements it will need to manage with the structure, allocates space for the structure with space for those elements included, and then puts a structure in the allocated space. That is, the typical use case for structures with flexible array members is with dynamically allocated space. I expect the C committee saw little need to require compilers to support flexible array members in static or automatic objects, instead of dynamic objects.
You've omitted some important language in the example you quoted - here's the full text:
20 EXAMPLE 2 After the declaration:struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if
p had been declared as:struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might
not be the same).
IOW, flexible array members only really come into play if you allocate the struct instance dynamically and allocate additional space for the array member.
A flexible array member has no size, so it doesn't contribute to the size of the struct type - that is, the result of sizeof (struct s) evaluates to the size of the type without the array.
IMO it is because sizeof of the initialized this way struct cannot be determined in another compilation unit when the struct is declared as extern.
The "struct inheritance technique" in C (as described in this question) is made possible by the fact that the C standard guarantees that the first member of a struct will never have any padding before it (?), and that the address of the first member will always be equal to the address of the struct itself.
This allows usage such as the following:
typedef struct {
// some fields
} A;
typedef struct {
A base;
// more fields
} B;
typedef struct {
B base;
// yet more fields
} C;
C* c = malloc(sizeof(C));
// ... init c or whatever ...
A* a = (A*) c;
// ... access stuff on a etc.
B* b = (B*) c;
// ... access stuff on b etc.
This question has two parts:
A. It seems to me this technique breaks the strict aliasing rule. Am I wrong, and if so, why?
B. Suppose that this technique is indeed legal. In that case, does it make a difference if A: we first store the object in an lvalue of its specific type, before down or up casting it to a different type, or B: if we cast it directly to the particular type desired at the moment, without first storing it in the lvalue of the specific type?
For example, are these three options all equally legal?
Option 1:
C* make_c(void) {
return malloc(sizeof(C));
}
int main(void) {
C* c = make_c(); // First store in a lvalue of the specific type
A* a = (A*) c;
// ... do stuff with a
C* c2 = (C*) a; // Cast back to C
// ... do stuff with c2
return 0;
}
Option 2:
C* make_c(void) {
return malloc(sizeof(C));
}
int main(void) {
A* a = (A*) make_c(); // Don't store in an lvalue of the specific type, cast right away
// ... do stuff with a
C* c2 = (C*) a; // Cast back to C
// ... do stuff with c2
return 0;
}
Option 3:
int main(void) {
A* a = (A*) malloc(sizeof(C)); // Don't store in an lvalue of the specific type, cast right away
// ... do stuff with a
C* c2 = (C*) a; // Cast to C - even though the object was never actually stored in a C* lvalue
// ... do stuff with c2
return 0;
}
A. It seems to me this technique breaks the strict aliasing rule. Am I wrong, and if so, why?
Yes, you are wrong. I'll consider two cases:
Case 1: The C is fully initialized
That would be this, for example:
C *c = malloc(sizeof(*c));
*c = (C){0}; // or equivalently, "*c = (C){{{0}}}" to satisfy overzealous compilers
In that case, all the bytes of the representation of a C are set, and the effective type of the object comprising those bytes is C. This comes from paragraph 6.5/6 of the standard:
If a value is stored into an object having no declared type through an
lvalue having a type that is not a character type, then the type of
the lvalue becomes the effective type of the object for that access
and for subsequent accesses that do not modify the stored value.
But structure and array types are aggregate types, which means that objects of such types contain other objects within them. In particular, each C contains a B identified as its member base. Because the allocated object is, at this point, effectively a C, it contains a sub-object that is effectively a B. One syntax for an lvalue referring to that B is c->base. The type of that expression is B, so it is consistent with the strict-aliasing rule to use it to access the B to which it refers. That has to be ok, else structures (and arrays) would not work at all, whether dynamically allocated or not.*
But, as discussed in my answer to your previous question, (B *)c is guaranteed to be equal (in value and type) to &c->base. Thus *(B *)c is another lvalue referring to the B that is the first member of *c. That the syntax of that expression is different from that of the previous lvalue we considered is of no account. It is an lvalue of type B, associated with an object of type B, so using it to access the object to which it refers is one of the cases allowed by the SAR.
None of this is any different from the statically and automatically allocated cases.
Case 2: The C is not fully initialized
That could be something like this:
C *c = malloc(sizeof(*c));
*(B *)c = (B){0};
We have thereby assigned to the initial B-sized portion of the allocated object via an lvalue of type B, so the effective type of that initial portion is B. The allocated space does not at this point contain an object of (effective) type C. We can access the B and its members, read or write, via any acceptably-typed lvalues referring to them, as discussed above. But we have a strict aliasing violation if we
attempt to read *c as a whole (e.g. C c2 = *c;);
attempt to read C members other than base (e.g. X x = c->another;); or
attempt to read the allocated object via an lvalue of most unrelated types (e.g. Unrelated_but_not_char u = *(Unrelated_but_not_char *) c;
The first two of those cases are of interest here, and they make sense in terms of the dynamically allocated object, when interpreted as a C, not being fully initialized. Similar incomplete-initialization cases can arise with automatically allocated objects, too; they also produce undefined behavior, but by different rules.
Note well, however, that there is no strict aliasing violation for any write to the allocated space, because any such write will (re)assign the effective type of (at least) the region that is written to.
And that brings us to the main tricksome bit. What if we do this:
C *c = malloc(sizeof(*c));
c->base = (B){0};
? Or this:
C *c = malloc(sizeof(*c));
c->another = 0;
The allocated object does not have any effective type before the first write to it (and in particular, it does not have effective type C), so do write-to-member expressions via *c even make sense? Are they well-defined? The letter of the standard might support an argument that they do not, but no implementation adopts such interpretation, and there is no reason to think that any ever would.
The interpretation most consistent with both the letter of the standard and universal practice is that writing through a member-access lvalue constitutes simultaneously writing to the member and to its host aggregate, thus setting the effective type of the whole region, even though only one member's value is written. Of course, that still does not make it ok to read members whose values have not been written -- because their values are indeterminate, not because of the SAR.
That leaves this case:
C *c = malloc(sizeof(*c));
*(B *)c = (B){0};
B b2 = c->base; // What about this?
That is, if the effective type of an initial region of the allocated space is B, can we use a member-access lvalue based on type C to read the stored value of that B region? Again, one might argue not, on the basis that there is no actual C, but in practice, no implementation makes that interpretation. The effective type of the object being read -- the initial region of the allocated space -- is the same as the type of the lvalue used for access, so in that sense there is no SAR violation. That the host C is wholly hypothetical is a question primarily of syntax, not semantics, because the same region can definitely be read as an object of the same type via an alternative expression.
* But the SAR nevertheless forestalls any debate on this point by providing that "an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union)" is among the types that may be accessed. This clears any ambiguity surrounding the position that accessing a member also constitutes accessing any objects containing it.
I believe this quote from C11 (ISO/IEC 9899:2011 §6.5 7) should answer some of your questions (my emphasis added):
An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
— a type compatible with the effective type of the object,
— a qualified version of a type compatible with the effective type of the object,
— a type that is the signed or unsigned type corresponding to the effective type of the object,
— a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
— an aggregate or union type that includes one of the aforementioned types among its
members (including, recursively, a member of a subaggregate or contained union), or
— a character type.
Then more can be answered by this (ISO/IEC 9899:2011 §6.7.2.1 15):
A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.
The rest can be answered by this snippet (ISO/IEC 9899:2011 §7.22.3 1):
The order and contiguity of storage allocated by successive calls to the
aligned_alloc, calloc, malloc, and realloc functions is unspecified. The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object with a fundamental alignment requirement and then used to access such an object or an array of such objects in the space allocated (until the space is explicitly deallocated).
In conclusion:
A. You're wrong. See the first and second quotes for the reasoning.
B. No, it doesn't make a difference. See the third quote (and maybe the first) for the reasoning.
Yes, the first element of the structure doesn't have any padding before it.
Second, when the type of an anonymous field is a typedef for a struct or union, code may refer to the field using the name of the typedef.
This is a good practice taken from the GCC manuals:
typedef struct {
// some fields
} A;
typedef struct {
A;
// more fields
} B;
typedef struct {
B;
// yet more fields
} C;
B get_B (struct C *c) { return c->B; } /* access B */
Please check Unnamed Structure and Union Fields
This question already has answers here:
Struct memory layout in C
(3 answers)
Closed 3 years ago.
Let's say my code is:
typedef stuct {
int x;
double y;
char z;
} Foo;
would x, y, and z, be right next to each other in memory? Could pointer arithmetic 'iterate' over them?
My C is rusty so I can not quite get the program right to test this.
Here is my code in full.
#include <stdlib.h>
#include <stdio.h>
typedef struct {
int x;
double y;
char z;
} Foo;
int main() {
Foo *f = malloc(sizeof(Foo));
f->x = 10;
f->y = 30.0;
f->z = 'c';
// Pointer to iterate.
for(int i = 0; i == sizeof(Foo); i++) {
if (i == 0) {
printf(*(f + i));
}
else if (i == (sizeof(int) + 1)) {
printf(*(f + i));
}
else if (i ==(sizeof(int) + sizeof(double) + 1)) {
printf(*(f + i));
}
else {
continue;
}
return 0;
}
No, it is not guaranteed for struct members to be contiguous in memory.
From §6.7.2.1 point 15 in the C standard (page 115 here):
There may be unnamed padding within a structure object, but not at its beginning.
Most of the times, something like:
struct mystruct {
int a;
char b;
int c;
};
Is indeed aligned to sizeof(int), like this:
0 1 2 3 4 5 6 7 8 9 10 11
[a ][b][padding][c ]
Yes and no.
Yes, the members of a struct are allocated within a contiguous block of memory. In your example, an object of type Foo occupies sizeof (Foo) contiguous bytes of memory, and all the members are within that sequence of bytes.
But no, there is no guarantee that the members themselves are adjacent to each other. There can be padding bytes between any two members, or after the last one. The standard does guarantee that the first defined member is at offset 0, and that all the members are allocated in the order in which they're defined (which means you can sometimes save space by reordering the members).
Normally compilers use just enough padding to satisfy the alignment requirements of the member types, but the standard doesn't require that.
So you can't (directly) iterate over the members of a structure. If you want to do that, and if all the members are of the same type, use an array.
You can use the offsetof macro, defined in <stddef.h>, to determine the byte offset of (non-bitfield) member, and it can sometimes be useful to use that to build a data structure that can be used to iterate over the members of a structure. But it's tedious, and rarely more useful than simply referring to the members by name -- particularly if they have different types.
would x, y, and z, be right next to each other in memory?
No. The struct memory allocation layout is implementation dependent - there is no guarantee struct members are right next to each other. One reason is memory padding, which is
Could pointer arithmetic 'iterate' over them?
No. You can only do pointer arithmetic for pointers to the same type.
would x, y, and z, be right next to each other in memory?
They could be, but don't have to be. The placement of elements in structures is not mandated by the ISO C standard.
In general, compiler will place the elements at some offset that is "optimal" for the architecture it compiles to. So, on 32-bit CPUs, most compilers will, by default, place elements at offsets that are multiples of 4 (as that will make for most efficient access). But, most compilers also have ways to specify different placement (alignment).
So, if you have something like:
struct X {
uint8_t a;
uint32_t b;
};
Then offset of a would be 0, but offset of b would be 4 on most 32-bit compilers with default options.
Could pointer arithmetic 'iterate' over them?
Not like the code in you example. Pointer arithmetic on pointers to structures is defined to add/subtract the address with the size of the structure. So, if you have:
struct X a[2];
struct X *p = a;
then p+1 == a+1.
To "iterate" over elements you would need to cast the p to uint8_t* and then add the offset of the element to it (using offsetof standard macro), element by element.
It depends on the padding decided on by the compiler (which is influenced by the requirements and advantages on the target architecture). The C standard does guarantee that there is to be no padding before the first member of a struct, but after that, you cannot assume anything. However, if the sizeof the struct does equal the sizeof each of its constituent types, then there is no padding.
You can enforce no padding with a compiler-specific directive. On MSVC, that's:
#pragma pack(push, 1)
// your struct...
#pragma pack(pop)
GCC has __attribute__((packed)) for the equivalent effect.
There are multiple issues with trying to use pointer arithmetic in this matter.
The first issue, as has been mentioned in other answers, is that there could be padding throughout the struct throwing off your calculations.
C11 working draft 6.7.2.1 p15: (bold emphasis mine)
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared. A pointer to a
structure object, suitably converted, points to its initial member (or if that member is a
bit-field, then to the unit in which it resides), and vice versa. There may be unnamed
padding within a structure object, but not at its beginning.
The second issue is that pointer arithmetic is done in multiples of the size of the type being pointed to. In the case of a struct, if you add 1 to a pointer to a struct, the pointer will be pointing to an object after the struct. Using your example struct Foo:
Foo x[3];
Foo *y = x+1; // y points to the second Foo (x[1]), not the second byte of x[0]
6.5.6 p8:
When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original
array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P)+N (equivalently, N+(P)) and (P)-N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist.
A third issue is that performing pointer-arithmetic such that the result points more than one past the end of the object causes undefined behavior, as does dereferencing a pointer to one element past the end of the object obtained through the pointer arithmetic. So even if you had a struct containing three ints with no padding inbetween and took a pointer to the first int and incremented it to point to the second int, dereferencing it would cause undefined behavior.
More from 6.5.6: (bold-italic emphasis mine)
Moreover, if the expression P points to the last element of an array object, the expression (P)+1 points one past the last element of the
array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.
A fourth issue is that dereferencing a pointer to one type as another type results in undefined behavior. This attempt at type-punning is often referred to as a strict-aliasing violation. The following is an example of undefined behavior through strict-aliasing violation even though the data types are the same size (assuming 4-byte int and float) and nicely aligned:
int x = 1;
float y = *(float *)&x;
6.5 p7:
An object shall have its stored value accessed only by an lvalue expression that has one of
the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type
of the object,
a type that is the signed or unsigned type corresponding
to the effective type of the object,
a type that is the signed or unsigned type corresponding to a
qualified version of the effective type of the object,
an aggregate or union type that includes one of the
aforementioned types among its members (including, recursively, a
member of a subaggregate or contained union), or
a character type.
Summary:
No, a C struct does not necessarily hold its members in contiguous memory, and even if it did, the pointer arithmetic you still couldn't do what you want to do with pointer arithemetic.
I have noticed that an empty array in the end of the structure is often used in open source projects:
typedef struct A
{
......
void *arr[];
} A;
I want to know is this a C standard? Or only OK for gcc compiler?
As of C99, it is now a C standard. Pre-C99 compilers may not support it. The old approach was to declare a 1-element array, and to adjust the allocation size for that.
New way:
typedef struct A
{
......
void *arr[];
} A;
int slots = 3;
A* myA = malloc(sizeof(A) + slots*sizeof(void*));
myA->arr[2] = foo;
Old way:
typedef struct A
{
......
void *arr[1];
} A;
int slots = 3;
A* myA = malloc(sizeof(A) + (slots-1)*sizeof(void*));
myA->arr[2] = foo;
The standard (draft N1570) 18 of 6.7.2.1, states:
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.
I'm programming in C99 and use variable length arrays in one portion of my code. I know in C89 zero-length arrays are not allowed, but I'm unsure of C99 and variable length arrays.
In short, is the following well defined behavior?
int main()
{
int i = 0;
char array[i];
return 0;
}
No, zero-length arrays are explicitly prohibited by C language, even if they are created as VLA through a run-time size value (as in your code sample).
6.7.5.2 Array declarators
...
5 If the size is an expression that is not an integer constant
expression: if it occurs in a declaration at function prototype scope,
it is treated as if it were replaced by *; otherwise, each time it is
evaluated it shall have a value greater than zero.
Zero-length arrays are not allowed in C. Statically typed arrays must have a fixed, non-zero size that is a constant expression, and variable-length-arrays must have a size which evaluates non-zero; C11 6.7.6.2/5:
each time it [the size expression] is evaluated it shall have a value greater than zero
However, C99 and C11 have a notion of a flexible array member of a struct:
struct foo
{
int a;
int data[];
};
From C11, 6.7.21/18:
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed;
Zero-length arrays are not allowed in standard C(not even C99 or C11). But gcc does provide an extension to allow it. See http://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
struct line {
int length;
char contents[0];
};
struct line *thisline = (struct line *)
malloc (sizeof (struct line) + this_length);
thisline->length = this_length;