Please help me with this concept:
Which would take up more memory after compiling if both are declared and initialized?
struct V
{
int a, b;
};
struct X
{
struct V v;
int N;
};
OR:
struct X
{
int a, b, c;
};
Rules for structure padding:
Padding is only inserted when a structure member is followed by a member with a larger alignment requirement or at the end of the structure.
The last member is padded with the number of bytes required so that the total size of the structure should be a multiple of the largest alignment of any structure member.
This suggests that both will take same memory space, 12-bytes each assuming size of int is 4 bytes. The reason is that there is no padding in either case.
My common sense says me that in both cases the program will allocate the same amount of memory (3 integers).
In the end the C structs are not more than a way to help the developer to organize variables and make code more readable. Once compiled the program doesn't care about structs or code organization but about variables (memory) and operations (instructions).
Both variants of struct X take up the same amount of memory, after all, they only contain three integers and the nested struct doesn't affect the memory layout of its member data when it comes to size.
The struct in this case is only syntactic sugar to arrange its members in a specific order and doesn't introduce additional overhead.
If there is no additional padding between the ints, both structs will consume 3 * sizeof(int) bytes.
Related
I'm quit confused with the difference between flexible arrays and pointer as struct members. Someone suggested, struct with pointers need malloc twice. However, consider the following code:
struct Vector {
size_t size;
double *data;
};
int len = 20;
struct Vector* newVector = malloc(sizeof *newVector + len * sizeof*newVector->data);
printf("%p\n",newVector->data);//print 0x0
newVector->data =(double*)((char*)newVector + sizeof*newVector);
// do sth
free(newVector);
I find a difference is that the address of data member of Vector is not defined. The programmer need to convert to "find" the exactly address. However, if defined Vector as:
struct Vector {
size_t size;
double data[];
};
Then the address of data is defined.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
The difference is how the struct is stored. In the first example you over-allocate memory but that doesn't magically mean that the data pointer gets set to point at that memory. Its value after malloc is in fact indeterminate, so you can't reliably print it.
Sure, you can set that pointer to point beyond the part allocated by the struct itself, but that means potentially slower access since you need to go through the pointer each time. Also you allocate the pointer itself as extra space (and potentially extra padding because of it), whereas in a flexible array member sizeof doesn't count the flexible array member. Your first design is overall much more cumbersome than the flexible version, but other than that well-defined.
The reason why people malloc twice when using a struct with pointers could either be that they aren't aware of flexible array members or using C90, or alternatively that the code isn't performance-critical and they just don't care about the overhead caused by fragmented allocation.
I am wondering whether it is safe and able to malloc struct with pointers like this, and what is the exactly reason programmers malloc twice when using struct with pointers.
If you use pointer method and malloc only once, there is one extra thing you need to care of in the calculation: alignment.
Let's add one extra field to the structure:
struct Vector {
size_t size;
uint32_t extra;
double *data;
};
Let's assume that we are on system where each field is 4 bytes, there is no trailing padding on struct and total size is 12 bytes. Let's also assume that double is 8 bytes and requires alignment to 8 bytes.
Now there is a problem: expression (char*)newVector + sizeof*newVector no longer gives address that is divisible by 8. There needs to be manual padding of 4 bytes between structure and data. This complicates the malloc size calculation and data pointer offset calculation.
So the main reason you see 1 malloc pointer version less, is that it is harder to get right. With pointer and 2 mallocs, or flexible array member, compiler takes care of necessary alignment calculation and padding so you don't have to.
I have a structure of the following type:
typedef struct
{
unsigned char A;
unsigned long int B;
unsigned short int C;
}
According to the alignment requirements of each basic data type and the alignment requirement of the whole structure, the allocation in memory will be like that:
My question is what is the importance of those trailing padding bytes as long as each structure member is naturally aligned to its size and could be accessed by our processor in one cycle (assuming that the bus size of processor is 32-bit) without alignment faults.
Also, if we declared an array of "2" of this structure, without taking into consideration the trailing bytes, the allocation in memory will be as following:
Each member in the two structures is naturally aligned to its size and could be accessed in one cycle without alignment faults.
So, what is the importance of trailing bytes in this case ?!
The comments from Bryan Olivier and Hans Passant are both right.
Essentially, you have answered your own question: In the 2nd drawing, the alignment of the members of both the first and second array item are correct. If the compiler could layout structures like this there would be no importance to the trailing pad bytes. But it can't.
In C, a structure's layout and size must be the same for every instance of a structure. In your second example, the sizeof(array[0]) is 10 and sizeof(array[1]) is 8. The address of B2 is only two greater than A2, but &B1 is four greater than &A1.
It's more than just alignment - it's ensuring a constant layout and size while still ensuring alignment. Even then it would not require trailing pad bytes if the first byte was aligned, but as you have noticed if you add arrays, then you need them.
Alignment+Layout/Size+Arrays => trailing pad is required.
As Andres said, the compiler can't generate special layout to every member in the array of structures.
For example, assume that the programmer has defined a structure with the following type
typedef struct
{
unsigned char A;
unsigned long int B;
unsigned short int C;
} myStructureType;
And then the programmer has created an instance of this type:
myStructureType myStructure;
The compiler will allocate memory to myStructure as following:
If the programmer has decided to create an array of myStructureType, the compiler will repeat the previous pattern in memory by the number of array's elements as following:
If the compiler neglects the trailing padding bytes, the memory will become misaligned as following
32-bit modern processor will need two memory cycles and some masking operations to fetch B1 (the element B in the index "1" of the array of structures). However, an old processor will fire an alignment faults.
That's why the trailing padding was important in this case
I was trying to give a "logical" counter-example to this answer indicating that sorting the members of a struct based on their size would minimize padding, when I encountered what seems to me as illogical.
Imagine the following struct:
struct A
{
int32_t a;
int16_t b;
};
sizeof this struct would normally be padded to 8 bytes to make sure a is aligned for example in an array of struct A.
Now image these other structs:
struct B
{
struct A a, b;
int16_t c, d;
};
struct C
{
struct A a;
int16_t c;
struct A b;
int16_t d;
};
As expected struct B has size 20 due to padding. However, I would have expected struct C to have size 16 since padding could be avoided, but ideone and gcc (with or without optimization) give a size of 24 bytes, clearly padding 2 bytes after each of the members.
My reasoning is that struct A in reality has only 6 bytes and should be padded when necessary, for example in an array of struct A or its usage in struct B. However, in struct C padding of struct A is unnecessary and c could have been placed where the padding of a could have been and the same with d and b.
Why doesn't the compiler minimize the padding by putting c where the padding of a would be?
P.S. I understand that sizeof(struct A) must return 8. Otherwise something like memset(array_of_A, 0, N * sizeof *array_of_A) won't work properly since array_of_A would contain padding while N * sizeof *array_of_A would ignore that padding.
The only thing I can think of that could be a problem is that with the optimization above then sizeof(struct C) would be smaller than the sizeof of all its members. However, I can't think of a case where such a thing could become a problem (i.e. a usage that is not based on undefined behavior).
struct C someC;
struct A someA;
*(struct A*)&(someC.a) = someA;
The assignment above may fail (mistakenly write to someC.c) if the padding you describe is supported by compilers.
EDITED: The example above relies on the compiler behavior when assigning structs. As I have known (an just checked) gcc copies as the struct is a flat region of memory, which is not member-wise.
EDITED: Changed from "would fail" to "may fail" since it's not defined if the padding bits shall be copied, see item 6 of section 6.2.6.1 of ISO_IEC_9899_2011:
When a value is stored in an object of structure or union type, including in a member
object, the bytes of the object representation that correspond to any padding bytes take
unspecified values.51)
and footnote 51):
51) Thus, for example, structure assignment need not copy any padding bits.
memcpy(&someC.a, &someA, sizeof(someC.a)) would write over someC.c.
That's what I was trying to get at with my comment about sizeof()'s having
to be different. For that memcpy() to work, sizeof(someC.a) would have to
be different from sizeof(someA) which just seems to be asking for a lot of
trouble and hard to find bugs.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why isn’t sizeof for a struct equal to the sum of sizeof of each member?
I can not understand why is it like this:
#include <stdio.h>
#include <stdlib.h>
typedef struct
{
char b;
int a;
} A;
typedef struct
{
char b;
} B;
int main() {
A object;
printf("sizeof char is: %d\n",sizeof(char));
printf("sizeof int is: %d\n",sizeof(int));
printf("==> the sizeof both are: %d\n",sizeof(int)+sizeof(char));
printf("and yet the sizeof struct A is: %d\n",sizeof(object));
printf("why?\n");
B secondObject;
printf("pay attention that the sizeof struct B is: %d which is equal to the "
"sizeof char\n",sizeof(secondObject));
return 0;
}
I think I explained my question in the code and there is no more need to explain. besides I have another question:
I know there is allocation on the: heap/static heap/stack, but what is that means that the allocation location is unknown, How could it be ?
I am talking about this example:
typedef struct
{
char *_name;
int _id;
} Entry;
int main()
{
Entry ** vec = (Entry**) malloc(sizeof(Entry*)*2);
vec[0] = (Entry *) malloc(sizeof (Entry));
vec[0]->_name = (char*)malloc(6);
strcpy (vec[0]->_name, "name");
vec[0]->_id = 0;
return 0;
}
I know that:
vec is on the stack.
*vec is on the heap.
*vec[0] is on the heap.
vec[0]->id is on the heap.
but :
vec[0]->_name is unknown
why ?
There is an unspecified amount of padding between the members of a structure and at the end of a structure. In C the size of a structure object is greater than or equal to the sum of the size of its members.
Take a look at this question as well as this one and many others if you search for CPU and memory alignment. In short, CPUs are happier if they access the memory aligned to the size of the data they are reading. For example, if you are reading a uint16_t, then it would be more efficient (on most CPUs) if you read at an address that is a multiple of 2. The details of why CPUs are designed in such a way is whole other story.
This is why compilers come to the rescue and pad the fields of the structures in such a way that would be most comfortable for the CPU to access them, at the cost of extra storage space. In your case, you are probably given 3 byte of padding between your char and int, assuming int is 4 bytes.
If you look at the C standard (which I don't have nearby right now), or the man page of malloc, you will see such a phrase:
The malloc() and calloc() functions return a pointer to the allocated memory
that is suitably aligned for any kind of variable.
This behavior is exactly due to the same reason I mentioned above. So in short, memory alignment is something to care about, and that's what compilers do for you in struct layout and other places, such as layout of local variables etc.
You're running into structure padding here. The compiler is inserting likely inserting three bytes' worth of padding after the b field in struct A, so that the a field is 4-byte aligned. You can control this padding to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, or the aligned attribute on GCC, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.)
See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
As to your second question, I'm unsure what you mean by the name is "unknown". Care to elaborate?
The compiler is free to add padding in structures to ensure that datatypes are aligned properly. For example, an int will be aligned to sizeof(int) bytes. So I expect the output for the size of your A struct is 8. The compiler does this, because fetching an int from an unaligned address is at best inefficient, and at worst doesn't work at all - that depends on the processor that the computer uses. x86 will fetch happily from unaligned addresses for most data types, but will take about twice as long for the fetch operation.
In your second code-snippet, you haven't declared i.
So vec[0]->_name is not unknown - it is on the heap, just like anything else you get from "malloc" (and malloc's siblings).
I heard a rumor that, in C, arrays that are contained inside structs may have padding added in between elements of the array. Now obviously, the amount of padding could not vary between any pair of elements or calculating the next element in an array is not possible with simple pointer arithmetic.
This rumor also stated that arrays which are not contained in structures are guaranteed to contain no padding. I know at least that part is true.
So, in code, the rumor is:
{
// Given this:
struct { int values[20]; } foo;
int values[20];
// This may be true:
sizeof(values) != sizeof(foo.values);
}
I'm pretty certain that sizeof(values) will always equal sizeof(foo.values). However, I have not been able to find anything in the C standard (specifically C99) that explicitly confirms or denies this.
Does anyone know if this rumor is addressed in any C standard?
edit: I understand that there may be padding between the end of the array foo.values and the end of the struct foo and that the standard states that there will be no padding between the start of foo and the start of foo.values. However, does anyone have a quote from or reference to the standard where it says there is no padding between the elements of foo.values?
No, there will never be padding in between elements of an array. That is specifically not allowed. The C99 standard calls array types "An array type describes a contiguously allocated nonempty set of objects...". For contrast, a structure is "sequentially", not "contiguously" allocated.
There might be padding before or after an array within a structure; that is another animal entirely. The compiler might do that to aid alignment of the structure, but the C standard doesn't say anything about that.
Careful here. Padding may be added at the end of the struct, but will not be added between the elements of the array as you state in your question. Arrays will always reference contiguous memory, though an array of structures may have padding added to each element as part of the struct itself.
In your example, the values and foo.values arrays will have the same size. Any padding will be part of the struct foo instead.
Here's the explanation as to why a structure may need padding between its members or even after its last member, and why an array doesn't:
Different types might have different alignment requirements. Some types need to be aligned on word boundaries, others on double or even quad word boundaries. To accomplish this, a structure may contain padding bytes between its members. Trailing padding bytes might be needed because the memory location directly ofter a structure must also conform to the structure's alignment requirements, ie if bar is of type struct foo *, then
(struct foo *)((char *)bar + sizeof(struct foo))
yields a valid pointer to struct foo (ie doesn't fail due to mis-alignment).
As each 'member' of an array has the same alignment requirement, there's no reason to introduce padding. This holds true for arrays contained in structures as well: If an array's first elment is correctly aligned, so are all following elements.
Yes, sort of. Variables are often aligned to some boundry, depending on the variable. Take the following, for instance:
typedef struct
{
double d;
char c;
} a_type_t;
double and char are 8 and 1 bytes, on my system, respectively. Total of 9. That structure, however, will be 16 bytes, so that the doubles will always be 8-byte aligned. If I had just used ints, chars, etc, then the alignment might be 1, 2, 4, or 8.
For some type T, sizeof(T) may or may not equal sizeof(T.a) + sizeof(T.b) + sizeof(T.c) ... etc.
Generally, this is entirely compiler and architecture dependent. In practice, it never matters.
Consider:
struct {
short s;
int i;
} s;
Assuming shorts are 16 bits and you're on 32 bits, the size will probably be 8 bytes as each struct members tends to be aligned a word (32 bit in this case) boundary. I say "probably" because it is implementation specific behaviour that can be varied by compiler flags and the like.
It's worth stressing that this is implementation behaviour not necessarily defined by the C standard. Much like the size of shorts, ints and longs (the C standard simply says shorts won't be larger than ints and longs won't be smaller than ints, which can end up as 16/32/32, 16/32/64, 32/32/64 or a number of other configurations).