Internal mechanism of sizeof in C? - c

I use sizeof to get size of a struct in C, but the result I got is unexpected.
struct sdshdr {
int len;
int free;
char buf[];
};
int main(){
printf("struct len:%d\n",(sizeof(struct sdshdr)));
return 0;
} //struct len:8, with or without buf
my question is why does buf not occupy any space and why is the size of the int type still 4 on a 64-bit CPU?
here is the output from gcc -v:
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)
Target: x86_64-apple-darwin14.4.0
Thread model: posix

The [] is a flexible array member. They do not count towards the total size of the struct, because the C standard explicitly says so:
6.7.2.1/18
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply.
This is intentional by design, because the purpose of a flexible array member is to allow you to allocate trailing data dynamically after the struct. (When the struct is a file header, protocol header etc.)
Example including discussion about non-standard gcc extensions and the old pre-C99 "struct hack".

From C99 onwards the size of an array at the end of a struct may be omitted. For purposes of sizeof(struct) this array will appear to have zero size (although its presence may add some padding to the struct), but the intent is for its length to be flexible, i.e., when allocating space for the struct one must allocate the desired amount of extra space for the array at the end. (To avoid going out of bounds, the actual allocated length of the array should be stored somewhere.)
Before C99 it was a fairly common hack to have an array of size 1 (or 0 where allowed by the compiler) at the end of a struct and then allocate more space for it, so C99 made this practice explicitly allowed by introducing the flexible array member with no size given.

As a GNU c extension, you have zero-length arrays:
As a GNU extension, the number of elements can be as small as zero. Zero-length arrays are useful as the last element of a structure which is really a header for a variable-length object:
for example, consider this code from The gnu c manual
struct line
{
int length;
char contents[0];
};
{
struct line *this_line = (struct line *)
malloc (sizeof (struct line) + this_length);
this_line -> length = this_length;
}
In ISO C99, you would use a flexible array member, which is slightly different in syntax and semantics:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.)

buf here is a flexible array member
Flexible array members have incomplete type, and so the sizeof operator may not be applied whereas original implementation of zero-length arrays, sizeof evaluates to zero.

Related

Padding at the end of struct with variable size array seems wrong

Consider these structs on common 64bit system:
struct V1 { // size 1, alignment 1
uint8_t size; // offset 0, size 1, alignment 1
uint8_t data[]; // offset 1, size 0, alignment 1
};
struct V2 { // size 12, alignment 4
char c; // offset 0, size 1, alignment 1
int length; // offset 4, size 4, alignment 4
char b; // offset 8, size 1, alignment 1
short blob[]; // offset 10, size 0, alignment 2
};
In the first case the data member is right at the end of the struct taking up no space. This causes the following odd-ness:
struct V1 blobs[2];
&blobs[0].data == &blobs[1].size
Luckily the C standard §6.7.2.1, paragraph 3 says:
A structure or union shall not contain a member with incomplete or function type,… except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.
So the above array is illegal and there is no problem with the addresses being the same.
What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?
Secondly I have a problem with V2. The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.
In fact I would go so far as to say it is wrong to add padding there. It obfuscates calculating the size of the struct for a given length of blob because now the offset of blob has to be considered instead of the size of the struct.
align = _Alignof(struct V2);
needed_size = offsetof(struct V2, blob) + length; // beware of overflow
needed_size = (needed_size + align - 1) & (~align); // beware of overflow
Is there something I'm missing why struct V2 must be padded?
What if I have code that, given a size, creates such structures in a
contiguous block of memory that was pre-allocated? Would it be illegal
for it to create instances with size == 0 because that would basically
be an array of the struct?
As #EricPostpischil explained in comments, the constraint in question is not about the layout of objects in memory, but rather about the declared element type of an actual array. An object that is not declared as an array is not an array in the relevant sense, no matter how array-like it may seem, or how we think about it or use it. So no, the language spec does not forbid what you describe.
The compiler adds extra padding at the end of V2 so the size is a
multiple of the alignment. This is necessary for structs in an array
so the following structs remain properly aligned. But V2 must never be
placed in an array so I fail to see why there should be any padding at
the end of V2.
The C language specification permits implementations to pad structure layouts after any member, including the last, at their own discretion. Among the primary purposes is to allow structure members to be properly aligned, including, but not limited to, within arrays of structures, but use of padding in structure layouts is not contingent on there being an alignment-based justification.
In fact I would go so far as to say it is wrong to add padding there.
"Wrong" a strong word. Especially in the context of a language-lawyer question, you should back it up with an argument based on the language specification. I don't think you can do that.
It obfuscates calculating the size of the struct for a given length of
blob because now the offset of blob has to be considered instead of
the size of the struct.
Not exactly true. If you want to compute the minimum possible size into which an instance of your structure can fit then yes, you need to take the offset of the FAM into account. However,
That's not a function of there being padding, but rather of the offset of the FAM differing from the size of the structure. That can't happen without padding, but it doesn't have to happen with padding.
If you are so space-constrained that you cannot accommodate the possibility of a few bytes of overallocation for the sake of clearer code, then dynamic allocation and FAMs probably are not a good idea in the first place. In particular, the allocator itself typically does not allocate with single-byte granularity.
Substituting an offsetof expression for a sizeof expression is hardly obfuscatory. It might even be clearer, since then the name of the FAM actually appears in the size computation. Your particular example code is somewhat overcomplicated, however, by the unnecessary measure employed to make the allocation size a multiple of the structure's alignment requirement.
Although the size of a structure type that has a FAM does not include the size of the FAM itself, it does include any padding between the penultimate member and the FAM, and possibly more:
In most situations, the flexible array member is ignored. In
particular, the size of the structure is as if the flexible array
member were omitted except that it may have more trailing padding than
the omission would imply.
(C17 6.7.2.1/18)
Thus, a pretty tight upper bound on the space needed for a structure of type struct S that has a flexible array member fam of type fam_t can be calculated as:
size_t bytes_needed = sizeof(struct S) + num_fam_elements * sizeof(fam_t);
That is in fact idiomatic, but if you prefer
size_t bytes_needed = offsetof(struct S, fam) + num_fam_elements * sizeof(fam_t);
if (bytes_needed < sizeof(struct S)) {
bytes_needed = sizeof(struct S);
}
for the absolute minimum then I see nothing objectionable about that form.
Is there something I'm missing why struct V2 must be padded?
Undoubtedly so, as you observe your implementation to pad it, but the implementation does not owe you an explanation.
Nevertheless, your implementation most likely applies a combination of rules such as these:
the alignment requirement for a structure type is the same as the strictest alignment requirement of any of its members, and
the size of a structure type is an an integer multiple of its alignment requirement.
Neither of those is a rule of the language itself, but they are fairly common in practice. In particular, they are part of the System V x86_64 ABI, and undoubtedly of other ABIs, too. Note that although those rules do serve the purpose of ensuring that structure members can be properly aligned inside an array of structures, they make no exception for structure types that are not allowed to be the element type of an array.
This answer addresses “Is there something I'm missing why struct V2 must be padded?”
If a compiler did not pad a structure type to be a multiple of its alignment requirement, then some structure types would violate this rule in C 2018 6.7.2.1 18:
… In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply…
To see this, consider this structure in an implementation where int is four bytes and has a four-byte alignment requirement:
struct s0
{
int i;
char c;
};
This structure requires five bytes for its members, so it must be padded to eight bytes to satisfy the alignment requirements when used in an array. Next, we add flexible array member:
struct s1
{
int i;
char c;
char a[];
};
This structure also requires five bytes for its inflexible members. None are required for the flexible array. If the compiler did not pad it to eight bytes, it would be shorter than struct s0, which violates the rule that its size must be either as if the flexible array member were omitted or that size plus more padding.
This tells us why a conforming compiler is constrained to include the padding. However, it does not tell us the reason for the rule. I see none except that it would be more complicated to write rules into the C standard to allow less padding.
Some Discussion About Object Size
Review of the C 2018 standard reveals nothing which explicitly says the size of an object must be a multiple of its alignment requirement. Obviously, the ability to put objects into an array depends on this, but the lack of a requirement that the size be a multiple of an alignment requirement would mean there might be some objects (besides a structure with flexible array member) that could not be used in arrays; the inability to put objects into an array would not cause the requirement to come into existence.
Thus, it might be conforming for a C implementation to define struct s0 to be five bytes with an alignment requirement of four bytes, and then it could make struct s1 also five bytes with an alignment requirement of four bytes.

Defined an empty array in c

What actually happened in following program, I have defined an empty array int arr[]; in code and with the GCC compiler, but the compiler doesn't give an error. It's successfully worked.
#include <stdio.h>
#include <stdlib.h>
typedef struct st
{
int i;
int arr[];
}ST,*ptr;
int main()
{
ST s1;
ptr p1= (ptr)malloc(sizeof(ST)+4*sizeof(int));
p1->i=10;
p1->arr[0]=1;
p1->arr[3] = 1;
printf("%d\n",p1->arr[3]);
printf("%ld\n", sizeof(s1));
}
C language not allowed undefined array length. but GCC compiler allowed. Why?
Just curious, what actually happens?
It's valid and allowed since C99. It's called flexible array member - useful to make the last member of the struct variable-length and it must be the last member of the struct if used.
You are looking at a C99 feature, Flexible array member. If the array is defined at the end of the struct, then you will define its size during allocation via malloc, by allocating a bigger chunk of memory.
for an array of size len:
struct st *mySt = malloc(sizeof(struct st) + len * sizeof(mySt->arr[0]));
C99, the structure of the last element allows the size of the array is unknown, which is called a flexible array member, but in front of the structure of a flexible array member must have at least one other member.
Flexible array members allowed to structure contains an array of variable size. This size of the structure sizeof returns do not include a flexible array of memory. Structure comprising a flexible array member using malloc () function allocates memory dynamically and allocates memory should be larger than the size of the structure to accommodate the expected size of the flexible array.
The basic model
typedef struct st
{
int i;
int arr[];
}ST,*ptr;
Tested under linux gcc about operating results for the structure, the last element of an array, and an array of undefined length, operating results consistent with theory, this is a bit like a c ++ class member functions, member functions is not occupied class space size. Well, this flexible array, there is the role of what is.

Array of size 0 at the end of struct [duplicate]

This question already has answers here:
What's the need of array with zero elements?
(5 answers)
Closed 5 years ago.
My professor of a systems programming course I'm taking told us today to define a struct with a zero-length array at the end:
struct array{
size_t size;
int data[0];
};
typedef struct array array;
This is a useful struct to define or initialize an array with a variable, i.e., something as follows:
array *array_new(size_t size){
array* a = malloc(sizeof(array) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
That is, using malloc(), we also allocate memory for the array of size zero. This is completely new for me, and it's seems odd, because, from my understanding, structs do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
From what he told us, it seems that an array defined as length zero at the end of a struct is ensured to come immediately after the last element of the struct, but this seems strange, because, again, from my understanding, structs could have padding.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Currently, there exists a standard feature, as mentioned in C11, chapter §6.7.2.1, called flexible array member.
Quoting the standard,
As a special case, the last element of a structure with more than one named member may
have an incomplete array type; this is called a flexible array member. In most situations,
the flexible array member is ignored. In particular, the size of the structure is as if the
flexible array member were omitted except that it may have more trailing padding than
the omission would imply. [...]
The syntax should be
struct s { int n; double d[]; };
where the last element is incomplete type, (no array dimensions, not even 0).
So, your code should better look like
struct array{
size_t size;
int data[ ];
};
to be standard-conforming.
Now, coming to your example, of a 0-sized array, this was a legacy way ("struct hack") of achieving the same. Before C99, GCC supported this as an extension to emulate flexible array member functionality.
Your professor is confused. They should go read what happens if I define a zero size array. This is a non-standard GCC extension; it is not valid C and not something they should teach students to use (*).
Instead, use standard C flexible array member. Unlike your zero-size array, it will actually work, portably:
struct array{
size_t size;
int data[];
};
Flexible array members are guaranteed to count as zero when you use sizeof on the struct, allowing you to do things like:
malloc(sizeof(array) + sizeof(int[size]));
(*) Back in the 90s people used an unsafe exploit to add data after structs, known as the "struct hack". To provide a safe way to extend a struct, GCC implemented the zero-size array feature as a non-standard extension. It became obsolete in 1999 when the C standard finally provided a better way to do this.
Other answers explains that zero-length arrays are GCC extension and C allows variable length array but no one addressed your other questions.
from my understanding, structs do not have their elements necessarily in continuous locations.
Yes. struct data type do not have their elements necessarily in continuous locations.
Why does the code in array_new allocate memory to data[0]? Why would it be legal to access then, say
array * a = array_new(3);
a->data[1] = 12;
?
You should note that one of the the restriction on zero-length array is that it must be the last member of a structure. By this, compiler knows that the struct can have variable length object and some more memory will be needed at runtime.
But, you shouldn't be confused with; "since zero-length array is the last member of the structure then the memory allocated for zero-length array must be added to the end of the structure and since structs do not have their elements necessarily in continuous locations then how could that allocated memory be accessed?"
No. That's not the case. Memory allocation for structure members not necessarily be contiguous, there may be padding between them, but that allocated memory must be accessed with variable data. And yes, padding will have no effect over here. The rule is:
§6.7.2.1/15
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
I've also seen around that this is just a feature of gcc and not defined by any standard. Is this true?
Yes. As other answers already mentioned that zero-length arrays are not supported by standard C, but an extension of GCC compilers. C99 introduced flexible array member. An example from C standard (6.7.2.1):
After the declaration:
struct s { int n; double d[]; };
the structure struct s has a flexible array member d. A typical way to use this is:
int m = /* some value */;
struct s *p = malloc(sizeof (struct s) + sizeof (double [m]));
and assuming that the call to malloc succeeds, the object pointed to by p behaves, for most purposes, as if p had been declared as:
struct { int n; double d[m]; } *p;
(there are circumstances in which this equivalence is broken; in particular, the offsets of member d might not be the same).
A more standard way would be to define your array with a data size of 1, as in:
struct array{
size_t size;
int data[1]; // <--- will work across compilers
};
Then use the offset of the data member (not the size of the array) in the calculation:
array *array_new(size_t size){
array* a = malloc(offsetof(array, data) + size * sizeof(int));
if(a){
a->size = size;
}
return a;
}
This is effectively using array.data as a marker for where the extra data might go (depending on size).
The way I used to do it is without a dummy member at the end of the structure: the size of the structure itself tells you the address just past it. Adding 1 to the typed pointer goes there:
header * p = malloc (sizeof (header) + buffersize);
char * buffer = (char*)(p+1);
As for structs in general, you can know that the fields are layed out in order. Being able to match some imposed structure needed by a file format binary image, operating system call, or hardware is one advantage of using C. You have to know how the padding for alignment works, but they are in order and in one contiguous block.

Why does GCC allow zero length array only as last member?

According to this,
https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html
It is said that the benefit is
They are very useful as the last element of a structure that is really
a header for a variable-length object
What does it mean?
The zero-length array is a GCC extension (read as: not standard) which you should not use.
While recent versions of C allow for someting similar (flexible array member with empty brackets), C++ knows no such thing. As people often mix C and C++, this is a possible source of confusion.
Instead, an array of length 1 should be used, which is standards-compliant under both C and C++, and which just works with every compiler.
What is this useful for at all?
Sometimes you need to access "invalid" out-of-bounds data knowing that it is valid in reality. In the strictest sense, this is undefined behavior (since you are accessing out-of-bounds values which are indeterminate, and using indeterminate values is UB), but that is only for what the compiler knows, not for what it fact, so it nevertheless "works fine".
For example, you might receive framed data on the network consisting of a tag word, a length, and an amount of data corresponding to the length given. Or an operating system function might return a variable amount of results to you (a couple of Win32 API functions work that way, for example).
In either case, you have a unknown (unknown at compile time) number of elements at the end of this structure, so it is not possible to define a single legitimate structure to hold everything.
That is what flexible array members are for. And with this, it is explained why they must be the last member as well. It doesn't make sense for something that could have "any size" to be anywhere but at the end -- it's impossible for the compiler to lay out any members after it, not knowing its size.
(In case you wonder how the compiler can ever free the storage not knowing the objects's size... it cannot! There normally exists an explicit function for freeing such an object as part of the API, which takes care of this exact problem.)
It's probably best to demonstrate with a small example:
#include <stdio.h>
#include <stdlib.h>
#define BLOB_TYPE_FOO 0xBEEF
struct blob {
/* Part of your object header... perhaps describing the type of blob. */
int type;
/* This is actually the length of the "data" field below */
unsigned length;
/* The data */
unsigned char data[];
};
struct blob *
create_blob(int type, size_t size)
{
/* Allocate enough space for the "header" and "size" bytes of data. */
struct blob *x = calloc(1, sizeof(struct blob) + size);
x->type = type;
x->length = size;
return x;
}
int
main(void)
{
/* Note that sizeof(struct blob) doesn't include the data field. */
printf("sizeof(struct blob): %zu\n", sizeof(struct blob));
struct blob *x = create_blob(BLOB_TYPE_FOO, 1000);
/*
You can manipulate data here, but be careful not to exceed the
allocated size.
*/
size_t i;
for (i = 0; i < 1000; i++)
{
x->data[i] = 'A' + (i % 26);
}
/*
Since data was allocated with the rest of the header, everything is
freed.
*/
free(x);
return 0;
}
The nice part about this setup is that sizeof(struct blob) represents the size of the "object header" (on my machine, that's 8 bytes), and that since you allocate the whole object together, a single free() is all that is needed to release the memory.
Like others have stated here, this is a non-standard extension and you should really consider using it with care. Damon's answer is the better way to go, though the sizeof() operation is not quite the right size (it's a bit too large to represent the size of the actual header). It's not too hard to workaround that problem though.
You cannnot have the array of 0 length because if you try to make a zero length array then it would mean that you are trying to create a pointer to nothing which is not correct. The C standard says:
Flexible array members are written as contents[] without the 0.
Flexible array members have incomplete type, and so the sizeof operator may not be applied. As a quirk of the original implementation of zero-length arrays, sizeof evaluates to zero.
Flexible array members may only appear as the last member of a struct that is otherwise non-empty.
A structure containing a flexible array member, or a union containing such a structure (possibly recursively), may not be a member of a structure or an element of an array. (However, these uses are permitted by GCC as extensions.

Strange variable-sized array declaration

Reading this Skip List implementation I came across this code fragment:
typedef struct nodeStructure{
keyType key;
valueType value;
node forward[1]; /* variable sized array of forward pointers */
};
To me it seems that forward[1] denotes a one element array. And the comment calls it a variable sized array.
Do I misunderstand something or this is just a mistake in the source I'm reading?
It is called the struct hack. It is the old form of the flexible array member introduced in C99.
This has been used in the past to mimic a variable array in the last member of a structure but it is not a strictly conformning construct in C.
This is a program paradigm in C that you will see sometimes. When allocating the structure, you will allocate sizeof(struct nodeStructure + numNodes * sizeof(node)).
This allows you to have multiple forward nodes for the struct, even though it is only declared to have one. It's a bit of an ugly hack, but it works.
Typically, when you do this, there will also be a filed called 'count' or something, so that you know how many extra entries are after the node.
This is a common trick for the older C compilers (before C99): compilers allowed you to dereference elements past the end of forward's declared length when it is the last element of the struct; you could then malloc enough memory for the additional node elements, like this:
nodeStructure *ptr = malloc(sizeof(nodeStructure)+4*sizeof(node));
for (int i = 0 ; i != 5 ; i++) { // The fifth element is part of the struct
ptr->forward[i] = ...
}
free(ptr);
The trick lets you embed arrays of variable size in a structure without a separate dynamic allocation. An alternative solution would be to declare node *forward, but then you'd need to malloc and free it separately from the nodeStructure, unnecessarily doubling the number of mallocs and potentially increasing memory fragmentation:
Here is how the above fragment would look without the hack:
typedef struct nodeStructure{
keyType key;
valueType value;
node *forward;
};
nodeStructure *ptr = malloc(sizeof(nodeStructure));
ptr->forward = malloc(5*sizeof(node));
for (int i = 0 ; i != 5 ; i++) {
ptr->forward[i] = ...
}
free(ptr->forward);
free(ptr);
EDIT (in response to comments by Adam Rosenfield): C99 lets you define arrays with no size, like this: node forward[]; This is called flexible array member, it is defined in the section 6.7.2.1.16 of the C99 standard.
The data structure implementation is most likely written against the C90 standard, which did not have flexible array members (added in C99). At that time, it was common to use a 1- or even 0-sized(*) array at the end of a struct to allow access to a dynamically variable number of elements there.
The comment should not be interpreted as meaning C99-style variable length arrays; besides, in C99, the idiomatic and standard-conformant definition for member forward would be node forward[];. A type such as struct nodeStructure with such a member is then called an incomplete type. You can define a pointer to it, but you cannot define a variable of this type or take its size, all operations that node forward[0] or node forward[1] allow, although these operations arguably mismatch the programmer's intent.
(*) 0-sized arrays are forbidden by the standard but GCC accepted these as an extension for precisely this use.

Resources