Can C arrays contain padding in between elements?

Can C arrays contain padding in between elements? - c

I heard a rumor that, in C, arrays that are contained inside structs may have padding added in between elements of the array. Now obviously, the amount of padding could not vary between any pair of elements or calculating the next element in an array is not possible with simple pointer arithmetic.
This rumor also stated that arrays which are not contained in structures are guaranteed to contain no padding. I know at least that part is true.
So, in code, the rumor is:
{
// Given this:
struct { int values[20]; } foo;
int values[20];
// This may be true:
sizeof(values) != sizeof(foo.values);
}
I'm pretty certain that sizeof(values) will always equal sizeof(foo.values). However, I have not been able to find anything in the C standard (specifically C99) that explicitly confirms or denies this.
Does anyone know if this rumor is addressed in any C standard?
edit: I understand that there may be padding between the end of the array foo.values and the end of the struct foo and that the standard states that there will be no padding between the start of foo and the start of foo.values. However, does anyone have a quote from or reference to the standard where it says there is no padding between the elements of foo.values?

No, there will never be padding in between elements of an array. That is specifically not allowed. The C99 standard calls array types "An array type describes a contiguously allocated nonempty set of objects...". For contrast, a structure is "sequentially", not "contiguously" allocated.
There might be padding before or after an array within a structure; that is another animal entirely. The compiler might do that to aid alignment of the structure, but the C standard doesn't say anything about that.

Careful here. Padding may be added at the end of the struct, but will not be added between the elements of the array as you state in your question. Arrays will always reference contiguous memory, though an array of structures may have padding added to each element as part of the struct itself.
In your example, the values and foo.values arrays will have the same size. Any padding will be part of the struct foo instead.

Here's the explanation as to why a structure may need padding between its members or even after its last member, and why an array doesn't:
Different types might have different alignment requirements. Some types need to be aligned on word boundaries, others on double or even quad word boundaries. To accomplish this, a structure may contain padding bytes between its members. Trailing padding bytes might be needed because the memory location directly ofter a structure must also conform to the structure's alignment requirements, ie if bar is of type struct foo *, then
(struct foo *)((char *)bar + sizeof(struct foo))
yields a valid pointer to struct foo (ie doesn't fail due to mis-alignment).
As each 'member' of an array has the same alignment requirement, there's no reason to introduce padding. This holds true for arrays contained in structures as well: If an array's first elment is correctly aligned, so are all following elements.

Yes, sort of. Variables are often aligned to some boundry, depending on the variable. Take the following, for instance:
typedef struct
{
double d;
char c;
} a_type_t;
double and char are 8 and 1 bytes, on my system, respectively. Total of 9. That structure, however, will be 16 bytes, so that the doubles will always be 8-byte aligned. If I had just used ints, chars, etc, then the alignment might be 1, 2, 4, or 8.
For some type T, sizeof(T) may or may not equal sizeof(T.a) + sizeof(T.b) + sizeof(T.c) ... etc.
Generally, this is entirely compiler and architecture dependent. In practice, it never matters.

Consider:
struct {
short s;
int i;
} s;
Assuming shorts are 16 bits and you're on 32 bits, the size will probably be 8 bytes as each struct members tends to be aligned a word (32 bit in this case) boundary. I say "probably" because it is implementation specific behaviour that can be varied by compiler flags and the like.
It's worth stressing that this is implementation behaviour not necessarily defined by the C standard. Much like the size of shorts, ints and longs (the C standard simply says shorts won't be larger than ints and longs won't be smaller than ints, which can end up as 16/32/32, 16/32/64, 32/32/64 or a number of other configurations).

Related

Padding at the end of struct with variable size array seems wrong

Consider these structs on common 64bit system:
struct V1 { // size 1, alignment 1
uint8_t size; // offset 0, size 1, alignment 1
uint8_t data[]; // offset 1, size 0, alignment 1
};
struct V2 { // size 12, alignment 4
char c; // offset 0, size 1, alignment 1
int length; // offset 4, size 4, alignment 4
char b; // offset 8, size 1, alignment 1
short blob[]; // offset 10, size 0, alignment 2
};
In the first case the data member is right at the end of the struct taking up no space. This causes the following odd-ness:
struct V1 blobs[2];
&blobs[0].data == &blobs[1].size
Luckily the C standard §6.7.2.1, paragraph 3 says:
A structure or union shall not contain a member with incomplete or function type,… except that the last member of a structure with more than one named member may have incomplete array type; such a structure (and any union containing, possibly recursively, a member that is such a structure) shall not be a member of a structure or an element of an array.
So the above array is illegal and there is no problem with the addresses being the same.
What if I have code that, given a size, creates such structures in a contiguous block of memory that was pre-allocated? Would it be illegal for it to create instances with size == 0 because that would basically be an array of the struct?
Secondly I have a problem with V2. The compiler adds extra padding at the end of V2 so the size is a multiple of the alignment. This is necessary for structs in an array so the following structs remain properly aligned. But V2 must never be placed in an array so I fail to see why there should be any padding at the end of V2.
In fact I would go so far as to say it is wrong to add padding there. It obfuscates calculating the size of the struct for a given length of blob because now the offset of blob has to be considered instead of the size of the struct.
align = _Alignof(struct V2);
needed_size = offsetof(struct V2, blob) + length; // beware of overflow
needed_size = (needed_size + align - 1) & (~align); // beware of overflow
Is there something I'm missing why struct V2 must be padded?

What if I have code that, given a size, creates such structures in a
contiguous block of memory that was pre-allocated? Would it be illegal
for it to create instances with size == 0 because that would basically
be an array of the struct?
As #EricPostpischil explained in comments, the constraint in question is not about the layout of objects in memory, but rather about the declared element type of an actual array. An object that is not declared as an array is not an array in the relevant sense, no matter how array-like it may seem, or how we think about it or use it. So no, the language spec does not forbid what you describe.
The compiler adds extra padding at the end of V2 so the size is a
multiple of the alignment. This is necessary for structs in an array
so the following structs remain properly aligned. But V2 must never be
placed in an array so I fail to see why there should be any padding at
the end of V2.
The C language specification permits implementations to pad structure layouts after any member, including the last, at their own discretion. Among the primary purposes is to allow structure members to be properly aligned, including, but not limited to, within arrays of structures, but use of padding in structure layouts is not contingent on there being an alignment-based justification.
In fact I would go so far as to say it is wrong to add padding there.
"Wrong" a strong word. Especially in the context of a language-lawyer question, you should back it up with an argument based on the language specification. I don't think you can do that.
It obfuscates calculating the size of the struct for a given length of
blob because now the offset of blob has to be considered instead of
the size of the struct.
Not exactly true. If you want to compute the minimum possible size into which an instance of your structure can fit then yes, you need to take the offset of the FAM into account. However,
That's not a function of there being padding, but rather of the offset of the FAM differing from the size of the structure. That can't happen without padding, but it doesn't have to happen with padding.
If you are so space-constrained that you cannot accommodate the possibility of a few bytes of overallocation for the sake of clearer code, then dynamic allocation and FAMs probably are not a good idea in the first place. In particular, the allocator itself typically does not allocate with single-byte granularity.
Substituting an offsetof expression for a sizeof expression is hardly obfuscatory. It might even be clearer, since then the name of the FAM actually appears in the size computation. Your particular example code is somewhat overcomplicated, however, by the unnecessary measure employed to make the allocation size a multiple of the structure's alignment requirement.
Although the size of a structure type that has a FAM does not include the size of the FAM itself, it does include any padding between the penultimate member and the FAM, and possibly more:
In most situations, the flexible array member is ignored. In
particular, the size of the structure is as if the flexible array
member were omitted except that it may have more trailing padding than
the omission would imply.
(C17 6.7.2.1/18)
Thus, a pretty tight upper bound on the space needed for a structure of type struct S that has a flexible array member fam of type fam_t can be calculated as:
size_t bytes_needed = sizeof(struct S) + num_fam_elements * sizeof(fam_t);
That is in fact idiomatic, but if you prefer
size_t bytes_needed = offsetof(struct S, fam) + num_fam_elements * sizeof(fam_t);
if (bytes_needed < sizeof(struct S)) {
bytes_needed = sizeof(struct S);
}
for the absolute minimum then I see nothing objectionable about that form.
Is there something I'm missing why struct V2 must be padded?
Undoubtedly so, as you observe your implementation to pad it, but the implementation does not owe you an explanation.
Nevertheless, your implementation most likely applies a combination of rules such as these:
the alignment requirement for a structure type is the same as the strictest alignment requirement of any of its members, and
the size of a structure type is an an integer multiple of its alignment requirement.
Neither of those is a rule of the language itself, but they are fairly common in practice. In particular, they are part of the System V x86_64 ABI, and undoubtedly of other ABIs, too. Note that although those rules do serve the purpose of ensuring that structure members can be properly aligned inside an array of structures, they make no exception for structure types that are not allowed to be the element type of an array.

This answer addresses “Is there something I'm missing why struct V2 must be padded?”
If a compiler did not pad a structure type to be a multiple of its alignment requirement, then some structure types would violate this rule in C 2018 6.7.2.1 18:
… In particular, the size of the structure is as if the flexible array member were omitted except that it may have more trailing padding than the omission would imply…
To see this, consider this structure in an implementation where int is four bytes and has a four-byte alignment requirement:
struct s0
{
int i;
char c;
};
This structure requires five bytes for its members, so it must be padded to eight bytes to satisfy the alignment requirements when used in an array. Next, we add flexible array member:
struct s1
{
int i;
char c;
char a[];
};
This structure also requires five bytes for its inflexible members. None are required for the flexible array. If the compiler did not pad it to eight bytes, it would be shorter than struct s0, which violates the rule that its size must be either as if the flexible array member were omitted or that size plus more padding.
This tells us why a conforming compiler is constrained to include the padding. However, it does not tell us the reason for the rule. I see none except that it would be more complicated to write rules into the C standard to allow less padding.
Some Discussion About Object Size
Review of the C 2018 standard reveals nothing which explicitly says the size of an object must be a multiple of its alignment requirement. Obviously, the ability to put objects into an array depends on this, but the lack of a requirement that the size be a multiple of an alignment requirement would mean there might be some objects (besides a structure with flexible array member) that could not be used in arrays; the inability to put objects into an array would not cause the requirement to come into existence.
Thus, it might be conforming for a C implementation to define struct s0 to be five bytes with an alignment requirement of four bytes, and then it could make struct s1 also five bytes with an alignment requirement of four bytes.

How can size of a structure be a non-multiple of 4?

I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!

Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.

#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);

Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))

How can I allocate memory for nested zero length array?

I want to create a struct Ring with nestd zero lenght array:
typedef struct Data_Block
{
size_t Data_Len;
char Buf[0];
}Block;
typedef struct Block_Ring
{
int head;
int tail;
int full;
int block_num;
Block blk[0];
}Ring;
How can I correctly allocate memory for a Ring which contains 32 Block, and one Block contains Buf of size 16? Because if I malloc with the right size, the number of Block will become just one.

Fundamental Problem
There is a fundamental problem you must deal with before addressing this task: Constructing an array requires having elements of a fixed and known size.
That is because array element i is located by adding i times the size of an element to the base address of the array. One can perform that calculation only if the size of an element exists (elements have a fixed size) and you know it (the size is known).
Although you define Block to contain a member of zero size (Buf is an array with zero elements), you intend to use it as if that member were 16 bytes (an array of 16 char). However, there is no way to tell the compiler that the Block objects you will allocate and use are actually Block objects with 16 extra bytes. You certainly can allocate space for them, and I will show you how, but then how do you intend to use them? If x is a Ring object, and you write x.blk[i], the compiler will generate code that multiplies i by what it thinks the size of a Block is, and that will be wrong because the compiler thinks a Block has zero bytes for Buf, but your Block objects are bigger.
Standard C Versus An Extension
Declaring a structure member as an array with zero elements is an extension (notably available in GCC). The 1999 C standard introduced a similar feature called flexible array members. With standard C, a flexible array member is declared with no dimension, rather than a zero dimension.
A flexible array member is an incomplete type (C 2018 6.7.2.1 18). In other words, the type is not fully specified. The number of members of the array is unknown, and so the total size of the array is unknown.
Then, in defining Ring, we cannot define the blk member to be a flexible array member that is an array of Block, because standard C requires that the element type of an array be a complete type (C 2018 6.7.6.2 1, “The element type shall not be an incomplete or function type”).
Therefore, this code cannot be made into standard C. This is actually an advantage: The C standard is preventing you from making the fundamental mistake above of creating an array that cannot work because the size of its elements is not known.
Oddly, GCC 8.1 for x86-64 fails to give a diagnostic for this. It should give a diagnostic for the constraint violation. Apple LLVM version 9.1.0 (clang-902.0.39.2) does issue a diagnostic.
However, we will proceed to consider the code as you have written it, using the language extension.
How Big Are The Elements?
When a C implementation lays out a structure, it must ensure that each member in the structure is correctly aligned. (What alignments are correct is defined by the implementation, so they vary. However, whatever they are, the compiler must lay out the structure accordingly.) Since structures can be used as elements of an array, the size of the laid-out structure must be such that when one structure follows another in the array, all the members in the following structure are also correctly aligned.
Satisfying this constraint requires that the size of the structure be a multiple of the alignment requirements of all members. For example, if there are members with alignment requirements of 4 bytes and 8 bytes, the size of the structure must be a multiple of 8 bytes, since that is the least common multiple of 4 bytes and 8 bytes. In fact, all alignment requirements are powers of two, so the least common multiple of all the alignment requirements is simply the largest (most restrictive) alignment requirement.
What this means is that, when allocating space for an array of your Block objects, you cannot simply use an arbitrary number of bytes for the extra Buf elements. You must ensure the total size of each Block object is a multiple of the alignment requirement of its members.
C provides a way to know the alignment requirement of the structure. The expression _Alignof(Block) is the alignment requirement. So, if you want each Block to have x elements in Buf, the size you need for each Block is the size of the base structure (sizeof(Block)) plus the size you need for the actual array elements (x * sizeof(char)) plus enough padding to round the total up to a multiple of the alignment requirement. You can calculate this with:
// Calculate desired space.
size_t S = sizeof(Block) + x * sizeof(char);
// Note the alignment requirement.
static const size_t A = _Alignof(Block);
// Round up to multiple of alignment requirement.
S = (S-1) / A * A + 1;
(This is a well-known expression for rounding up to a multiple of A. You can tinker with some examples to see why it works.)
Once you have calculated the space needed for one Block using the above code (with 16 for x), you can allocate space for one Ring with an array of 32 of these Block using:
Ring *R = malloc(sizeof(Ring) + 32 * S);
Accessing Array Elements
Now that you have the space, how do you access members of blk? As discussed above, the compiler does not know how to do this. Unfortunately, C does not provide any assistance. You will have to calculate addresses manually. Since you know the size of each of your Block objects, S, you can calculate the address of the Block with index i with:
Block *B = (Block *) ((char *) R->blk + S*i);
Discussion
This is cumbersome and error-prone. The address calculation could be wrapped into a helper function to make it a little better. However, it is generally not a good idea to use complicated code like this. You ought to consider alternative solutions.

are pad lengths different for each element in a struct? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C struct sizes inconsistence
For the following program, i'd like to obtain the size of a struct. However, it turns the size of it is 12 rather than 4*4=16. Does it means that each element can align to a different pad number? like int with 4 and short with 2, but in this case char should have 1.
Thx.
#include <stdio.h>
struct test{
int a;
char b;
short c;
int d;
};
struct test A={1,2,3,4};
int main()
{
printf("0X%08X\n",&A.a);
printf("0X%08X\n",&A.b);
printf("0X%08X\n",&A.c);
printf("0X%08X\n",&A.d);
printf("%d\n",sizeof(A));
}
And the result is:
0X00424A30
0X00424A34
0X00424A36
0X00424A38
12

Yes, every type don't have the same alignment. Each of your variable shall be aligned correctly, ie their addresses shall be a multiple of a certain size. The usual rule (for Intel and AMD, among other) is that every data type is aligned by its own size. Assuming x86 architecture, it seems to be right here:
0X00424A30: first address of the structure.
0X00424A34: 4 bytes (maybe sizeof(int)) after the first member. char requires an alignment of 1, so it doesn't need padding here.
0X00424A36: 2 bytes after the second member. short requires an alignment of 2, so there is 1 byte of padding.
0X00424A38: 2 bytes after the second member. int requires an alignment of 4, but the address is already a multiple of 4. So there is no padding byte.
Anyway, it is not portable assumption: C standard doesn't force anything here. It just allow padding bytes between your members and at the end of the structure.
By the way, you should rather use the following formats:
%p and typecast for pointers;
%zu or %u with typecast for sizeof.

Yes. Note that padding is up to the implementation, so it may end up differently on various platforms. C99 spec section 6.7.2.1 only states that thay may be padding between member of the structure and at its end. To make portable programs, you should not make any assumptions about the length of the padding.

Yes, each type has its own alignment restrictions.
The alignment restrictions of type T can never be stricter than requiring alignment to addresses that are a multiple of sizeof(T), as the two elements of the array T arr[2] are required to follow each other immediately without additional padding to make arr[1] correctly aligned.
It is allowed for a compiler to use less strict alignment requirements.
For example,
a char object must be byte-aligned (as sizeof(char) == 1 by definition)
a short object will typically be two-byte aligned (with sizeof(short) == 2), but could also be byte-aligned on some architectures
a int object will typically be four-byte aligned (with sizeof(int) == 4), but could also be two or even one byte-aligned on some architectures
a struct type will typically require an alignment equal to the alignment requirements of the most strictly aligned type among its members (sometimes with a minimum alignment > 1).
When building a struct, the members must all be correctly aligned, relative to the start of the struct, with the first member being at offset 0. To achieve this, the compiler may have to insert padding after a member to get the next member correctly aligned.

yes .because of Packing and byte alignment
The general answer is that compilers are free to add padding between members for alignment purpose.

Structures and Unions in C, determining size and accessing members

All,
Here is an example on Unions which I find confusing.
struct s1
{
int a;
char b;
union
{
struct
{
char *c;
long d;
}
long e;
}var;
};
Considering that char is 1 byte, int is 2 bytes and long is 4 bytes. What would be the size of the entire struct here ? Will the union size be {size of char*}+ {size of double} ? I am confused because of the struct wrapped in the union.
Also, how can I access the variable d in the struct. var.d ?

The sizes are implementation-defined, because of padding. A union will be at least the size of the largest member, while a struct will be at least the sum of the members' sizes. The inner struct will be at least sizeof(char *) to sizeof(long), so the union will be at least that big. The outer struct will be at least sizeof(int) + 1 + sizeof(char *) + sizeof(long). All of the structs and unions can have padding.
You are using an extension to the standard, unnamed fields. In ISO C, there would be no way to access the inner struct. But in GCC (and I believe MSVC), you can do var.d.
Also, you're missing the semi-colon after the inner struct.

With no padding, and assuming sizeof(int)==sizeof(char *)==sizeof(long)==4, the size of the outer struct will be 13.
Breaking it down, the union var overlaps an anonymous struct with a single long. That inner struct is larger (a pointer and a long) so its size controls the size of the union, making the union consume 8 bytes. The other members are 4 bytes and 1 byte, so the total is 13.
In any sensible implementation with the size assumptions I made above, this struct will be padded to either 2 byte or 4 byte boundaries, adding at least 1 or 3 additional bytes to the size.
Edit: In general, since the sizes of all of the member types are themselves implementation defined, and the padding is implementation defined, you need to refer to the documentation for your implementation and the platform to know for sure.
The implementation is allowed to insert padding after essentially any element of a struct. Sensible implementations use as little padding as required to comply with platform requirements (e.g. RISC processors often require that a value is aligned to the size of that value) or for performance.
If using a struct to map fields to the layout of values assumed by file format specification, a coprocessor in shared memory, a hardware device, or any similar case where the packing and layout actually matter, then you might want to be concerned that you are testing at either compile time or run time that your assumptions of the member layout are true. This can be done by verifying the size of the whole structure, as well as the offsets of its members.
See this question among others for a discussion of compile-time assertion tricks.

Unions are dangerous and risky to use without strict discipline. And the fact you put it in a struct is really dangerous because by default all struct members are public: that exposes the possibility of client code making changes to your union, without informing your program what type of data it stuffed in there. If you use a union you should put it in a class where at least you can hide it by making it private.
We had a dev years ago who drank the koolaid of unions, and put it in all his data structures. As a result, the features he wrote with it, are now one of the most despised parts of our entire application, since they are unmodifiable, unfixable and incomprehensible.
Also Unions throw away all the type safety that modern c/c++ compilers give you. Surely if you lie to the compiler it will get back at you someday. Well actually it will get back at your customer when your app crashes.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight