This question already has answers here:
Why isn't sizeof for a struct equal to the sum of sizeof of each member?
(13 answers)
Closed 8 years ago.
I've made a doubly linked structure in C, and need to know how to calculate the size of custom made structures. I understand the size of certain data types, and that pointers are 8 bytes on my machine.
However when I create this data type
struct doublylinked {
int data;
struct doublylinked *next;
struct doublylinked *prev;
};
I get that all the values inside add up to 20 bytes in total.
size of data = 4
size of next = 8
size of prev = 8
However when I print out the size of this data type it equals 24.
size of doublylinked = 24
Where are these extra 4 bytes coming from?
Thanks
The extra space come from the padding added by the compiler, which make access faster on some CPU.
It might actually look like this in memory:
data [4 bytes]
padding [4 bytes] <- That way, next is aligned on a multiple of his own size
next [8 bytes]
prev [8 bytes]
The 8-byte next and prev variables need to have an 8-byte alignment.
In other words, they need to be located in memory addresses divisible by 8.
So the compiler adds a 4-byte padding before or after the 4-byte data variable.
Please note that this is generally platform-dependent (i.e., some processors may support unaligned load/store operations, in which case, the compiler may choose to avoid the padding).
Related
I have struct in C:
typedef struct Node {
int data; // 4 bytes int + 4 bytes for alignment
struct Node* prev; // 8 bytes pointer
struct Node* next; // 8 bytes pointer
} Node;
The size of this struct is 24 bytes (8 + 8 + 8). When I use the sizeof(Node), the compiler also shows 24 bytes.
However, when I create two or more structs on the heap (one after another) and look at their memory location, there are 8 byte gaps between each Node struct.
For example:
11121344 (the 1st Node address)
11121376 (the 2nd Node address) // 376-344 = 32-24 = 8 extra bytes
11121408 (the 3rd Node address) // 408-376 = 32-24 = 8 extra bytes
Can you explain why compiler separates Node structs by adding 8 bytes between Nodes?
There are 2 possible reasons for your observation:
The C standard requires that malloc always returns memory chunks with maximum alignment to prevent alignment issues no matter what you allocate for.
malloc manages memory chunks internally by using some sort of data structures. Depending on the implementation, it would add additional information to each memory chunk for internal usage. For instance, malloc could manage memory chunks in a linked list, then it would require each chunk to hold an additional pointer that points to the next chunk.
The maximum alignment depends on the architecture and the compiler / malloc - implementation used.
For your case and assuming glibc, taken straight out of the docs of glibc/malloc.c :
Alignment: 2 * sizeof(size_t) (default)
(i.e., 8 byte alignment with 4byte size_t). This suffices for
nearly all current machines and C compilers. However, you can
define MALLOC_ALIGNMENT to be wider than this if necessary.
Minimum overhead per allocated chunk: 4 or 8 bytes
Each malloced chunk has a hidden word of overhead holding size
and status information.
Minimum allocated size: 4-byte ptrs: 16 bytes (including 4 overhead)
8-byte ptrs: 24/32 bytes (including, 4/8 overhead)
Thus malloc in your case will align to 2 * sizeof(size_t) = 16 bytes.
Also note the 'hidden overhead' mentioned. This overhead is due store additional internal information used for memory management...
Can you explain why compiler separates Node structs by adding 8 bytes between Nodes?
It's a coincidence. There is no rule about how to lay out memory for any sequence of malloc() calls.
The address can be ascending with a fixed interval, descending with varying intervals, (seemingly) random, ..., ....
If you want fixed relative addresses use an array
struct Node arr[3];
ptrdiff_t delta10 = &arr[1] - &arr[0];
ptrdiff_t delta20 = &arr[2] - &arr[0];
ptrdiff_t delta21 = &arr[2] - &arr[1];
if (delta10 != delta21) /* cannot happen */;
or allocate a group of elements (maybe with realloc()) at the same time
struct Node *elements = malloc(3 * sizeof *elements);
ptrdiff_t delta10 = &elements[1] - &elements[0];
ptrdiff_t delta20 = &elements[2] - &elements[0];
ptrdiff_t delta21 = &elements[2] - &elements[1];
if (delta10 != delta21) /* cannot happen */;
free(elements);
I was studying the book "Data Structure and algorithms made easy" and but I got confused while learning "Comparing Linked Lists and Unrolled Linked list"...
what is overhead?
Why he is only stating 8 bytes of overhead for 100 elements array?
Overhead is all the stuff that is not part of the data that you want to store. Like the pointers to the next and previous element.
The block list is a list of arrays. Each array contains a number of elements. In principle your entire list could consist of a single block node with an array of all your elements. So less overhead.
It's a bit confusing that head in LinkedBlock points to a ListNode - it should point to whatever the data is (without the prev and next pointers).
I think the book contains a serious error in the definition of struct LinkedBlock. Let's get back to that later and start with:
what is overhead?
The struct ListNode is designed for storing one integer but besides the integer each node has two pointers. So for each node you'll need to allocate 1 integer + 2 pointers. Let's assume 4 byte integer and 4 byte pointer. So each node will require 4 + 2x4 = 12 bytes. So in order to store 1 item of your real data (aka 1 integer) you need to allocate 12 bytes. You have wasted 8 bytes on pointers. These 8 "wasted" bytes are called overhead. They are used for bookkeeping only - not for data.
But it get worse than that... When allocating dynamic memory (which you normally do when using linked list) there are some additional overhead. The allocator may need a little extra memory for every malloc to store information about the malloc. Another issue is that malloc ed memory may aligned to some fixed block size (e.g. 16 or 32 byte) so if you allocate 20 byte there is no way to use the remaining 12 bytes - they are wasted. This is what the book calls "allocation overhead". The "allocation overhead" is system dependent but the book assumes 8 extra overhead bytes from each malloc.
So now each malloc 'ed struct ListNode takes up:
4 bytes for the integer
8 bytes for 2 pointers
8 bytes for allocation overhead
A total of 20 bytes where 4 bytes is for your data and 16 bytes are overhead. So for each integer you need to store, you'll need 20 bytes. And if you want to store 1000 integers, you end up wasting 16kb on overhead in order to store 4kb of data.
Now back to the struct LinkedBlock. In the book it looks like this:
struct LinkedBlock {
struct LinkedBlock *next;
struct LinkedNode *head;
int nodeCount;
};
I'm pretty sure there is a mistake in the book and that it should look like this instead:
struct LinkedBlock {
struct LinkedBlock *next;
int *dataArray;
int nodeCount;
};
The way to use this is something like:
struct LinkedBlock pNode = malloc(sizeof(struct LinkedBlock));
pNode->dataArray = malloc( 100 * sizeof(int) );
The first malloc requires 4 + 4 + 4 + 8 = 20 bytes. (pointer, pointer, int, allocation overhead)
The second malloc requires 4 * 100 + 8 = 408 bytes. (100 int, allocation overhead)
So a total of 428 bytes.
However, since the malloc'ed data can hold 100 integers (corresponding to 400 bytes), your overhead is only 28 bytes. In other words - in average you use 4.28 bytes for each integer. Compare that to the first method that required 20 bytes for each integer.
Why he is only stating 8 bytes of overhead for 100 elements array?
That was because the array was allocated in a single call and each malloc call is assumed to have 8 bytes allocation overhead.
In normal linked list, 1 node have 1 element and 2 pointer (8 bytes), 2 pointer is overhead since it's not your data. In unrolled linked list, 1 node have 100 element and 2 pointer (8 bytes), hence 8 bytes overhead for 100 element.
typedef struct Node {
int data;
struct Node *next;
} node;
pointer->next = (node*)malloc(sizeof(node));
How many bytes of memory are dynamically given to pointer->next in the above code. For (int*)malloc(sizeof(int)), 2 bytes are given. Likewise how many for node?
Malloc will dinamically assign the size of "node".
Node is a struct and the size of every struct depends on the size of every element inside the struct.
In this case, the size of node will be: size of int + size of struct Node*
(If the result is not multiple of 2, it will be padded for architecture reasons)
Your device has an architecture of 2 bytes, and for that reason, the size of the structs can only be 2, 4, 6, 8 etc...
The size of int depends on the target you are working on. Since your architecture is 16 bits, the size of int is 2 bytes.
About. the size of struct Node *, you need to know that EVERY pointer data types have exactly the same size, it doesn't matter the data type their are pointing to. And that size also depends on the architecture. Again, your architecture is 16 bits and that's why the size of struct node * is 2 bytes.
size of int = 2.
size of struct node * = 2
Total memory assigned by malloc = 2 + 2 = 4
First, a suggestion: rewrite
pointer->next=(node*)malloc(sizeof(node));
as
pointer->next = malloc( sizeof *pointer->next );
You don't need the cast (unless you're working on a pre-ANSI implementation, in which case God help you), and using the dereferenced target as the operand of sizeof means you don't have to specify the type, potentially saving you some maintenance heartburn.
Also, a little whitespace goes a long way (although you don't need to put whitespace around the function arguments - that's my style, some people don't like it, but it makes things easier for me to read).
How much bytes of memory is dynamically given to pointer->next
It will be at least as big as sizeof (int) plus sizeof (struct Node *), and potentially may be bigger; depending on your platform, it could be as small as 4 bytes or as large as 16. C allows for "padding" bytes between struct members to satisfy alignment requirements for the underlying architecture. For example, a particular architecture may require that all multi-byte objects be aligned on addresses that are multiples of 4; if your data member is only 2 bytes wide, then there will be 2 unused bytes between it and the next member.
Without knowing a lot about your system, we just can't tell you. You can take that same code and try it on multiple compilers, and you'll get different answers. You have to check yourself, using sizeof(node) or sizeof(struct Node) (I think either syntax works, but just in case).
This question already has answers here:
Structure padding and packing
(11 answers)
Closed 8 years ago.
I am a bit lost on calculating the size of structures
So we have the structure:
struct AcronymNode{
struct AcronymNode* next;
char acronym[5];
double num_phrases;
struct The_Phrase* phrase_list;
} Dictionary;
I see it as
next : 4bytes
acronym: 5bytes + 3
num_phrases: 8bytes
phraselist: 4bytes
=24 bytes
When I look at the notes it says: 32 bytes = 4 + 5 + 3 (alignment to
word) + 4 (to align for the double) + 8 + 4 + 4 (to align next structure to a multiple of 8 for
the double)
Why are we adding an extra 8 for alignment since it doesn't overflow, 4 before the double and 4 after the 2nd structure
In the more efficient structure it has double first, following the structures for 24 bytes
Also I wanted to check if this is right
structT{
int a;
char b[5];
float c;
char d[2];
};
Is the size 4 + 5+3 + 4 + 4 = 20?
If memory access latency is not something you are concerned with, you can instruct the compiler to layout a structure on a different alignment (than that which is most efficient for the machine). For example:
#pragma pack(1)
struct AcronymNode{
struct AcronymNode* next;
char acronym[5];
double num_phrases;
struct The_Phrase* phrase_list;
} Dictionary;
#pragma pack()
Although '#pragma pack' is not officially part of the C language, it is supported by most compilers. In the example above, '#pragma pack(1)' instructs the compiler to pack the structure on a one-byte boundaries, effectively changing the layout of the structure as you described it:
next : 4bytes
acronym: 5bytes + 3
num_phrases: 8bytes
phraselist: 4bytes
=24 bytes
Then, '#pragma pack()' returns alignment back to it's default.
.
'#pragma pack(1)' is often used to define structures where other alignments are not desirable; for example, when sending such structures "over the wire" to another system. Wire protocols are generally packed so that there is no filler between fields.
packing structs to a 1 byte boundary is used to optimize for space, otherwise the compiler will pad out for speed performance.
See: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
Why isn’t sizeof for a struct equal to the sum of sizeof of each member?
Consider the following C code:
#include <stdio.h>
struct employee
{
int id;
char name[30];
};
int main()
{
struct employee e1;
printf("%d %d %d", sizeof(e1.id), sizeof(e1.name), sizeof(e1));
return(0);
}
The output is:
4 30 36
Why is the size of the structure not equal to the sum of the sizes of its individual component variables?
The compiler may add padding for alignment requirements. Note that this applies not only to padding between the fields of a struct, but also may apply to the end of the struct (so that arrays of the structure type will have each element properly aligned).
For example:
struct foo_t {
int x;
char c;
};
Even though the c field doesn't need padding, the struct will generally have a sizeof(struct foo_t) == 8 (on a 32-bit system - rather a system with a 32-bit int type) because there will need to be 3 bytes of padding after the c field.
Note that the padding might not be required by the system (like x86 or Cortex M3) but compilers might still add it for performance reasons.
As mentioned, the C compiler will add padding for alignment requirements. These requirements often have to do with the memory subsystem. Some types of computers can only access memory lined up to some 'nice' value, like 4 bytes. This is often the same as the word length. Thus, the C compiler may align fields in your structure to this value to make them easier to access (e.g., 4 byte values should be 4 byte aligned) Further, it may pad the bottom of the structure to line up data which follows the structure. I believe there are other reasons as well. More info can be found at this wikipedia page.
Your default alignment is probably 4 bytes. Either the 30 byte element got 32, or the structure as a whole was rounded up to the next 4 byte interval.
Aligning to 6 bytes is not weird, because it is aligning to addresses multiple to 4.
So basically you have 34 bytes in your structure and the next structure should be placed on the address, that is multiple to 4. The closest value after 34 is 36. And this padding area counts into the size of the structure.