Unsized array declaration in a struct - c

Why does C permit this:
typedef struct s
{
int arr[];
} s;
where the array arr has no size specified?

This is C99 feature called flexible arrays, the main feature is to allow the use variable length array like features inside a struct and R.. in this answer to another question on flexible array members provides a list of benefits to using flexible arrays over pointers. The draft C99 standard in section 6.7.2.1 Structure and union specifiers paragraph 16 says:
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
So if you had a s* you would allocate space for the array in addition to space required for the struct, usually you would have other members in the structure:
s *s1 = malloc( sizeof(struct s) + n*sizeof(int) ) ;
the draft standard actually has a instructive example in paragraph 17:
EXAMPLE After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this
is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p
behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the
offsets of member d might not be the same).

You are probably looking for flexible arrays in C99. Flexible array members are members of unknown size at the end of a struct/union.
As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply.
You may also look at the reason for the struct hack in the first place.
It's not clear if it's legal or portable, but it is rather popular. An implementation of the technique might look something like this:
#include <stdlib.h>
#include <string.h>
struct name *makename(char *newname)
{
struct name *ret =
malloc(sizeof(struct name)-1 + strlen(newname)+1);
/* -1 for initial [1]; +1 for \0 */
if(ret != NULL) {
ret->namelen = strlen(newname);
strcpy(ret->namestr, newname);
}
return ret;
}
This function allocates an instance of the name structure with the
size adjusted so that the namestr field can hold the requested name
(not just one character, as the structure declaration would suggest).
Despite its popularity, the technique is also somewhat notorious -
Dennis Ritchie has called it "unwarranted chumminess with the C implementation." An official interpretation has deemed that it is NOT
strictly conforming with the C Standard, although it does seem to work
under all known implementations. Compilers that check array bounds
carefully might issue warnings.

Related

char[] size not being counted

I have the following code:
#include <stdio.h>
#include <stdint.h>
typedef struct E_s {
uint32_t a;
uint32_t b;
uint32_t c;
} E_t;
typedef struct S_s {
uint32_t data_sz;
char data[];
} S_t;
typedef struct F_s {
E_t E;
S_t S;
char data[16];
//} __attribute__((packed)) full_msg_t;
} F_t;
int main(int argc, char* argv[])
{
F_t out;
printf("sizeof(out.data) = %lu\n", sizeof(out.data));
printf("sizeof(out.E) = %lu\n", sizeof(E_t));
printf("sizeof(out.S) = %lu\n", sizeof(S_t));
printf("sizeof(out) = %lu\n", sizeof(F_t));
return EXIT_SUCCESS;
}
When I run the code, I see the following output:
sizeof(out.data) = 16
sizeof(out.E) = 12
sizeof(out.S) = 4
sizeof(out) = 32
Question: Why is the size of S_t 4 (third line of output)? I was expecting it to be 8 (uint32_t + char[]). Why is the size of char[] not included?
Furthermore, both out.data and out.S.data point to the same memory location, which caused me to dive deep and find the above observation. Any clue here will also be very helpful. I was not expecting those 2 variables to overlap.
The standard specifies that the variable part of a structure with a flexible array member (FAM) is ignored when the size is calculated:
As a special case, the last element of a structure with more than one named member may have an incomplete array type; this is called a flexible array member. In most situations, the flexible array member is ignored. In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply. However, when a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed; the offset of the array shall remain that of the flexible array member, even if this would differ from that of the replacement array. If this array would have no elements, it behaves as if it had one element but the behavior is undefined if any attempt is made to access that element or to generate a pointer one past it.
Emphasis added
Note that the struct F_s (aka F_t) should not be accepted; that violates the constraint in §6.7.2.1 ¶3:
A structure or union shall not contain a member with incomplete or function type (hence, a structure shall not contain an instance of itself, but may contain a pointer to an instance of itself), except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.
The compiler should reject that (or, at least, emit a diagnostic) because constraint violations require a diagnostic. Even if the compiler doesn't reject it outright, you can't actually use the FAM of the embedded S_t because the data member of F_t doesn't move — the offsets of the elements of a structure are fixed at compile time. It would, de facto, use the data element of F_t, but that isn't defined behaviour.
In this struct:
typedef struct S_s {
uint32_t data_sz;
char data[];
} S_t;
The data member is a flexible array member. Such a member does not contribute to the size of a struct as its size is not specified. This is spelled out in section 6.7.2.1p18 of the C standard:
As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply.
So the size of S_t does not include the data member, which is why sizeof(S_t) is 4.
Such a member can only be used when memory for the struct is dynamically allocated. For example:
S_t *s = malloc(sizeof(S_t) + 10);
This allows you to access from s->data[0] to s->data[9]
This also means that you can't put a struct with a flexible array member inside of another struct or in an array, because there's no way to know exactly where the flexible array member ends.
This is spelled out in section 6.7.2.1p3:
A structure or union shall not contain a member with incomplete or
function type (hence, a structure shall not contain an instance of
itself, but may contain a pointer to an instance of itself), except
that the last member of a structure with more than one named member
may have incomplete array type; such a structure (and any union
containing, possibly recursively, a member that is such a structure)
shall not be a member of a structure or an element of an array
char data[]; is a flexible array member and it is explicitly guaranteed not to have its size counted. Because it is mainly supposed to be used as malloc(sizeof(St_t) + n), where n is the size of the data array.
As for S_t S; inside the other struct, that's invalid C since the struct containing a flexible array member must be placed at the end and in the outer-most struct and you didn't do that. So your code doesn't compile in standard C and there it isn't possible to make assumptions that out.S.data and out.data are somehow the same memory, because all of that is beyond the scope of the C language. I suppose it might be possible that GNU C offers deterministic behavior in the form of non-standard extensions, but I'm not aware of any such guarantees.
Because char[] in structure S_t is called flexible array, which is a feature introduced in the C99 standard of the C programming language.
This maybe helpful flexible-array-members-structure-c

Why is this initialization of a structure with a flexible array member invalid but valid with an fixed size array member?

The C standard states (emphasize mine):
21 EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. [...]
22 Following the above declaration:
struct s t1 = { 0 }; // valid
struct s t2 = { 1, { 4.2 }}; // invalid
t1.n = 4; // valid
t1.d[0] = 4.2; // might be undefined behavior
The initialization of t2 is invalid (and violates a constraint) because struct s is treated as if it did not contain member d.
Source: C18, §6.7.2.1/20 + /21
I do not understand the explanation of "because struct s is treated as if it did not contain member d"
If I use the initializer of { 1, { 4.2 }};, the { 4.2 } part is to initialize the flexible array member;
To be precise to initialize the flexible array member to be consisted of one element and initialize this element to the value 4.2 and thus stuct s is treated as it has member d or not?
This sentence makes no sense in my eyes.
Why does the standard say, that { 4.2 } wouldn't initialize/denote the flexible array member and thus the structure would be treated as if it has no member d?
If I use a fixed size array, this notation works and initializes the member with no complain:
struct foo {
int x;
double y[1];
};
int main (void)
{
struct foo a = { 1, { 2.3 } };
}
Evidence
Why is this initialization invalid when the structure has an flexible array member but valid when the structure has an fixed size array member?
Could you elaborate that?
I've read:
Why does static initialization of flexible array member work?
and
How to initialize a structure with flexible array member
and
Flexible array members can lead to undefined behavior?
and others but none of them answers me what this sentence wants to explain and why exactly this this is invalid.
Related:
How does an array of structures with flexible array members behave?
What are the real benefits of flexible array member?
I guess this is a language defect. While it might make no sense to initialize a flexible array member, the standard needs to address that issue somewhere. I can't find such normative text anywhere.
The definition of a flexible array member is, C17 6.7.2.1/18:
As a special case, the last element of a structure with more than one named member may have an
incomplete array type; this is called a flexible array member. In most situations, the flexible array
member is ignored. In particular, the size of the structure is as if the flexible array member were
omitted except that it may have more trailing padding than the omission would imply.
From this we learn that a flexible array member is an incomplete array type. We do not however learn in what situations the flexible array member is ignored, save for when calculating the size of the struct. "In most situations" isn't helpful and is the defect - this needed to be expanded to an exhaustive list, including the behavior of flexible array members when part of an initializer list. Otherwise one may assume that it behaves just like any other array of incomplete type.
C17 6.2.5/22:
An array type of unknown size is an incomplete type.
And then the rules for initialization say, C17 6.7.9:
The type of the entity to be initialized shall be an array of unknown size or a complete object type that is not a variable length array type.
So far there is no normative text saying that we are not allowed to provide an initializer for a flexible array member - on the contrary. The example in the question (C17 6.7.2.1 example 21) is not normative, since examples aren't normative in ISO standards. The example doesn't mention which constraint that is violated, nor does it mention where it says that the flexible array member must be ignored.
I suppose I'd probably file a DR about this.
I do not understand the explanation of "because struct s is treated as if it did not contain member d".
The C standard also says “In most situations, the flexible array member is ignored.” It is unclear why you would not understand what the meaning of this is. If struct s is declared struct s { int n; double d[]; };, then, in most situations, the C implementation behaves as if it were declared struct s { int n; };. Therefore, struct s t2 = { 1, { 4.2 }}; fails because the 4.2 is an initializer for something that, in effect, does not exist.
It is sensible to ask why this is the situation. For the most part, I expect a compiler could support a definition in which the array initializers were counted and used to set the structure size. Certainly compilers do this with array definitions such s int a[] = { 3, 4, 5};. However, that is not the typical use case for flexible array members. Typically, a program receives information about how many elements it will need to manage with the structure, allocates space for the structure with space for those elements included, and then puts a structure in the allocated space. That is, the typical use case for structures with flexible array members is with dynamically allocated space. I expect the C committee saw little need to require compilers to support flexible array members in static or automatic objects, instead of dynamic objects.
You've omitted some important language in the example you quoted - here's the full text:
20 EXAMPLE 2 After the declaration:struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if
p had been declared as:struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might
not be the same).
IOW, flexible array members only really come into play if you allocate the struct instance dynamically and allocate additional space for the array member.
A flexible array member has no size, so it doesn't contribute to the size of the struct type - that is, the result of sizeof (struct s) evaluates to the size of the type without the array.
IMO it is because sizeof of the initialized this way struct cannot be determined in another compilation unit when the struct is declared as extern.

Why can't I retrieve my flexible array member size?

OK so I was reading the standard paper (ISO C11) in the part where it explains flexible array members (at 6.7.2.1 p18). It says this:
As a special case, the last element of a structure with more than one
named member may have an incomplete array type; this is called a
flexible array member. In most situations, the flexible array member
is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more
trailing padding than the omission would imply. However, when a . (or
->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member,
it behaves as if that member were replaced with the longest array
(with the same element type) that would not make the structure larger
than the object being accessed; the offset of the array shall remain
that of the flexible array member, even if this would differ from that
of the replacement array. If this array would have no elements, it
behaves as if it had one element but the behavior is undefined if any
attempt is made to access that element or to generate a pointer one
past it.
And here are some of the examples given below (p20):
EXAMPLE 2 After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to
use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to
by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in
particular, the offsets of member d might not be the same).
Added spoilers as examples inside the standard are not documentation.
And now my example (extending the one from the standard):
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
struct s { int n; double d[]; };
int m = 7;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m])); //create our object
printf("%zu", sizeof(p->d)); //retrieve the size of the flexible array member
free(p); //free out object
}
Online example.
Now the compiler is complaining that p->d has incomplete type double[] which is clearly not the case according the standard paper. Is this a bug in the GCC compiler?
As a special case, the last element of a structure with more than one named member may have an incomplete array type; ... C11dr 6.7.2.1 18
In the following d is an incomplete type.
struct s { int n; double d[]; };
The sizeof operator shall not be applied to an expression that has function type or an incomplete type ... C11dr §6.5.3.4 1
// This does not change the type of field `m`.
// It (that is `d`) behaves like a `double d[m]`, but it is still an incomplete type.
struct s *p = foo();
// UB
printf("%zu", sizeof(p->d));
This looks like a defect in the Standard. We can see from the paper where flexible array members were standardized, N791 "Solving the struct hack problem", that the struct definition replacement is intended to apply only in evaluated context (to borrow the C++ terminology); my emphasis:
When an lvalue whose type is a structure
with a flexible array member is used to access an object, it behaves as
if that member were replaced by the longest array that would not make
the structure larger than the object being accessed.
Compare the eventual standard language:
[W]hen a . (or ->) operator has a left operand that is (a pointer to) a structure with a flexible array member and the right operand names that member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed [...]
Some form of language like "When a . (or ->) operator whose left operand is (a pointer to) a structure with a flexible array member and whose right operand names that member is evaluated [...]" would seem to work to fix it.
(Note that sizeof does not evaluate its argument, except for variable length arrays, which are another kettle of fish.)
There is no corresponding defect report visible via the JTC1/SC22/WG14 website. You might consider submitting a defect report via your ISO national member body, or asking your vendor to do so.
Standard says:
C11-§6.5.3.4/2
The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand.
and it also says
C11-§6.5.3.4/1
The sizeof operator shall not be applied to an expression that has function type or an incomplete type, [...]
p->d is of incomplete type and it can't be an operand of sizeof operator. The statement
it behaves as if that member were replaced with the longest array (with the same element type) that would not make the structure larger than the object being accessed
doesn't hold for sizeof operator as it determine size of the object by the type of object which must be a complete type.
First, what is happening is correct in terms of the standard, arrays that are declared [] are incomplete and you can't use the sizeof operator.
But there is also a simple reason for it in your case. You never told your compiler that in that particular case the d member should be viewed as of a particular size. You only told malloc the total memory size to be reserved and placed p to point to that. The compiler has obtained no type information that could help him deduce the size of the array.
This is different from allocating a variable length array (VLA) or a pointer to VLA:
double (*q)[m] = malloc(sizeof(double[m]));
Here the compiler can know what type of array q is pointing to. But not because you told malloc the total size (that information is not returned from the malloc call) but because m is part of the type specification of q.
The C Standard is a bit loosey-goosey when it comes to the definition of certain terms in certain contexts. Given something like:
struct foo {uint32_t x; uint16_t y[]; };
char *p = 1024+(char*)malloc(1024); // Point to end of region
struct foo *q1 = (struct foo *)(p -= 512); // Allocate some space from it
... some code which uses *q1
struct foo *q2 = (struct foo *)(p -= 512); // Allocate more space from it
there's no really clear indication of what storage is occupied by objects
*q1 or *q2, nor by q1->y or q2->y. If *q1 will never be accessed afterward,
then q2->y may be treated as a uint16_t[509], but writing to *q1 will trash
the contents of q2->y[254] and above, and writing q2->y[254] and above will
trash *q1. Since a compiler will generally have no way of knowing what will
happen to *q1 in the future, it will have no way of sensibly reporting a size
for q2->y.

What are the differences between Variable Length Arrays and Flexible Array Member?

I've seen in the ISO C99 committee draft that structs can have an incomplete an array with unspecified size its end, known as Flexible Array Member.
On the other hand C99 also has Variable Length Arrays, which allow declaring arrays with size not constant at compile-time.
I thought that a FAM was a special kind of a VLA, but I've seen two SO users claiming otherwise. Also, reading the Wikipedia section on sizeof, it says that sizeof behaves differently for those two.
Why do both of them exist instead of just one? (Are their use-cases too different?)
Also, which other associated behaviors are different for each of them?
There are two different things that the C99 standard added and they are easy to mix up by mistake:
Flexible array members. This means that a struct can have a member of unknown size at the end. Example from the C standard:
struct s { int n; double d[]; };
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
This was used before C99 as well, but it was then undefined behavior, known as the "struct hack" referred to in another answer. Before C90, there could be unexpected padding bytes at the end of the struct, leading to bugs.
Variable length arrays (VLA). These are arrays with their size set in runtime. They are most likely implemented by the compiler by using dynamic memory allocation. Example:
void func (int n)
{
int array[n];
}
referred from user29079 : https://softwareengineering.stackexchange.com/questions/154089/c-flexible-arrays-when-did-they-become-part-of-the-standard

Is empty array in the end of the structure a C standard?

I have noticed that an empty array in the end of the structure is often used in open source projects:
typedef struct A
{
......
void *arr[];
} A;
I want to know is this a C standard? Or only OK for gcc compiler?
As of C99, it is now a C standard. Pre-C99 compilers may not support it. The old approach was to declare a 1-element array, and to adjust the allocation size for that.
New way:
typedef struct A
{
......
void *arr[];
} A;
int slots = 3;
A* myA = malloc(sizeof(A) + slots*sizeof(void*));
myA->arr[2] = foo;
Old way:
typedef struct A
{
......
void *arr[1];
} A;
int slots = 3;
A* myA = malloc(sizeof(A) + (slots-1)*sizeof(void*));
myA->arr[2] = foo;
The standard (draft N1570) 18 of 6.7.2.1, states:
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. However, when a . (or ->) operator has a left operand that is
(a pointer to) a structure with a flexible array member and the right operand names that
member, it behaves as if that member were replaced with the longest array (with the same
element type) that would not make the structure larger than the object being accessed; the
offset of the array shall remain that of the flexible array member, even if this would differ
from that of the replacement array. If this array would have no elements, it behaves as if
it had one element but the behavior is undefined if any attempt is made to access that
element or to generate a pointer one past it.

Resources