I want to know what is boundary problem with respect to allocation of size of structures?
Any keyword for the same that I can google shall be helpful.
To calculate the sizes of user-defined types, the compiler takes into account any alignment space needed for complex user-defined data structures. This is why the size of a structure in C can be greater than the sum of the sizes of its members. For example, on many systems, the following code will print 8:
struct student{
char grade; /* char is 1 byte long */
int age; /* int is 4 bytes long */
};
printf("%zu", sizeof (struct student));
The reason for this is that most compilers, by default, align complex data-structures to a word alignment boundary. In addition, the individual members are also aligned to their respective alignment boundaries. By this logic, the structure student gets aligned on a word boundary and the variable age within the structure is aligned with the next word address. This is accomplished by way of the compiler inserting "padding" space between two members or to the end of the structure to satisfy alignment requirements. This padding is inserted to align age with a word boundary. (Most processors can fetch an aligned word faster than they can fetch a word value that straddles multiple words in memory, and some don't support the operation at all)
Referenced article: data structure alignment and Structure padding
Related
I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!
Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.
#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);
Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))
I'm working with vectors and matrices right now and it was suggested to me that I should use SSE instead of using float arrays. However while reading the definition for the C intrinsics and the Assembly instructions it looks like there is a different version of some of the function where the vector has to be "16 byte aligned" and a slower version where the vector isn't aligned. What does having the vector be 16 byte aligned mean? How can I ensure that my vectors are 16 byte aligned?
Alignment ensures that objects are aligned on an address that is a multiple of some power of two. 16-byte-aligned means that the numeric value of the address is a multiple of 16. Alignment is important because CPUs are often less efficient or downright incapable of loading memory that doesn't have the required alignment.
Your ABI determines the natural alignment of types. In general, integer types and floating-point types are aligned to either their own size, or the size of the largest object of that kind that your CPU can treat at once, whichever is smaller. For instance, on 64-bit Intel machines, 32-bit integers are aligned on 4 bytes, 64-bit integers are aligned on 8 bytes, and 128-bit integers are also aligned on 8 bytes.
The alignment of structures and unions is the same as their most aligned field. This means that if your struct contains a field that has a 2-byte alignment and another field that has an 8-byte alignment, the structure will be aligned to 8 bytes.
In C++, you can use the alignof operator, just like the sizeof operator, to get the alignment of a type. In C, the same construct becomes available when you include <stdalign.h>; alternatively, you can use _Alignof without including anything.
AFAIK, there is no standard way to force alignment to be specific value in C or C++, but there are compiler-specific extensions to do it. On Clang and GCC, you can use the __attribute__((aligned(N))) attribute:
struct s_Stuff {
int var1;
short var2;
char padding[10];
} __attribute__((aligned(16)));
(Example.)
(This attribute is not to be confused with __attribute__((align(N))), which sets the alignment of a variable.)
Off the top of my head, I'm not sure for Visual Studio, but according to SoronelHaetir, that would be __declspec(align(N)). Not sure where it goes on the struct declaration.
In the context of vector instructions, alignment is important because people tend to create arrays of floating-point values and operate on them, instead of using types that are known to be aligned. However, __m128, __m256 and __m512 (and all of their variants, like _m128i and such) from <emmintrin.h>, if your compiler environment has it, are guaranteed to be aligned on the proper boundaries for use with aligned intrinsics.
Depending on your platform, malloc may or may not return memory that is aligned on the correct boundary for vector objects. aligned_alloc was introduced in C11 to address these issues, but not all platforms support it.
Apple: does not support aligned_alloc; malloc returns objects on the most exigent alignment that the platform supports;
Windows: does not support aligned_alloc; malloc returns objects aligned on the largest alignment that VC++ will naturally put an object on without an alignment specification; use _aligned_malloc for vector types
Linux: malloc returns objects aligned on an 8- or 16-byte boundary; use aligned_alloc.
In general, it's possible to request slightly more memory and perform alignment yourself with minimal penalties (aside that you're on your own to write a free-like function that will accept a pointer returned by this function):
void* aligned_malloc(size_t size, size_t alignment) {
intptr_t alignment_mask = alignment - 1;
void* memory = malloc(size + alignment_mask);
intptr_t unaligned_ptr = (intptr_t)memory;
intptr_t aligned_ptr = (unaligned_ptr + alignment_mask) & ~alignment_mask;
return (void*)aligned_ptr;
}
Purists might argue that treating pointers as integers is evil, but at the time of writing, they probably won't have a practical cross-platform solution to offer in exchange.
xx-byte alignment means that a the variable's memory address modulo xx is 0.
Ensuring that is a compiler-specific operation, visual c++ for example has __declspec(align(...)), which will work for variables that the compiler allocates (at file or function scope for example), alignment is somewhat harder for dynamic memory, you can use aligned_malloc for that, although your library may already guarantee 16-byte alignment for malloc, it's generally larger alignments that require such a call.
New Edit to improve and focus my answer to the specific query
To ensure data alignment in memory, there are specific functions in C to force this (assuming your data is compatible - where your data matches or discretely fits into your required alignment)
The function to use is [_aligned_malloc][1] instead of vanilla malloc.
// Using _aligned_malloc
// Note alignment should be 2^N where N is any positive int.
int alignment = 16;
ptr = _aligned_malloc('required_size', alignment);
if (ptr == NULL)
{
printf_s( "Error allocation aligned memory.");
return -1;
}
This will (if it succeeds) force your data to align on the 16 byte boundary and should satisfy the requirements for SSE.
Older answer where I waffle on about struct member alignment, which matters - but is not directly answering the query
To ensure struct member byte alignment, you can be careful how you arrange members in your structs (largest first), or you can set this (to some degree) in your compiler settings, member attributes or struct attributes.
Assuming 32 bit machine, 4 byte ints: This is still 4 byte aligned in memory (first largest member is 4 bytes), but padded to be 16 bytes in size.
struct s_Stuff {
int var1; /* 4 bytes */
short var2; /* 2 bytes */
char padding[10]; /* ensure totals struct size is 16 */
}
The compiler usually pads each member to assist with natural alignment, but the padding may be at the end of the struct too. This is struct member data alignment.
Older compiler struct member alignment settings could look similar to these 2 images below...But this is different to data alignment which relates to memory allocation and storage of the data.
It confuses me when Borland uses the phrase (from the images) Data Alignment, and MS uses Struct member alignment. (Although they both refer to specifically struct member alignment)
To maximise efficiency, you need to code for your hardware (or vector processing in this case), so lets assume 32 bit, 4 byte ints, etc. Then you want to use tight structs to save space, but padded structs may improve speed.
struct s_Stuff {
float f1; /* 4 bytes */
float f2; /* 4 bytes */
float f3; /* 4 bytes */
short var2; /* 2 bytes */
}
This struct may be padded to also align the struct members to 4 byte multiples....The compiler will do this unless you specify that it keeps single byte struct member alignment - so the size ON FILE could be 14 bytes, but still in MEMORY an array of this struct would be 16 bytes in size (with 2 bytes wasted), with an unknown data alignment (possibly 8 bytes as default by malloc but not guaranteed. As mentioned above you can force the data alignment in memory with _aligned_malloc on some platforms)
Also regarding member alignment in a struct, the compiler will use multiples of the largest member to set the alignment. Or more specifically:
A struct is always aligned to the largest type’s alignment
requirements
...from here
If you are using a UNION, you are correct that it is forced to the largest possible struct see here
Check that your compiler settings do not contradict your desired struct member alignment / padding too, or else your structs may differ in size to what you expect.
Now, why is it faster? See here which explains how alignment allows the hardware to transmit discrete chunks of data and maximises the use of the hardware that passes around data. That is, the data does not need to be split up or re-arranged at every stage - through the hardware processing
As a rule, its best to set your compiler to resonate with your hardware (and platform OS) so that your alignment (and padding) works best with your hardware processing ability. 32 bit machines usually work best with 4 byte (32 bit) member alignment, but then data written to file with 4 byte member alignment can consume more space than wanted.
Specifically regarding SSE vectors, as this link states, 4 * 4 bytes is they best way to ensure 16 byte alignment, perhaps like this. (And they refer to data alignment here)
struct s_data {
float array[4];
}
or simply an array of floats, or doubles.
I have a structure of the following type:
typedef struct
{
unsigned char A;
unsigned long int B;
unsigned short int C;
}
According to the alignment requirements of each basic data type and the alignment requirement of the whole structure, the allocation in memory will be like that:
My question is what is the importance of those trailing padding bytes as long as each structure member is naturally aligned to its size and could be accessed by our processor in one cycle (assuming that the bus size of processor is 32-bit) without alignment faults.
Also, if we declared an array of "2" of this structure, without taking into consideration the trailing bytes, the allocation in memory will be as following:
Each member in the two structures is naturally aligned to its size and could be accessed in one cycle without alignment faults.
So, what is the importance of trailing bytes in this case ?!
The comments from Bryan Olivier and Hans Passant are both right.
Essentially, you have answered your own question: In the 2nd drawing, the alignment of the members of both the first and second array item are correct. If the compiler could layout structures like this there would be no importance to the trailing pad bytes. But it can't.
In C, a structure's layout and size must be the same for every instance of a structure. In your second example, the sizeof(array[0]) is 10 and sizeof(array[1]) is 8. The address of B2 is only two greater than A2, but &B1 is four greater than &A1.
It's more than just alignment - it's ensuring a constant layout and size while still ensuring alignment. Even then it would not require trailing pad bytes if the first byte was aligned, but as you have noticed if you add arrays, then you need them.
Alignment+Layout/Size+Arrays => trailing pad is required.
As Andres said, the compiler can't generate special layout to every member in the array of structures.
For example, assume that the programmer has defined a structure with the following type
typedef struct
{
unsigned char A;
unsigned long int B;
unsigned short int C;
} myStructureType;
And then the programmer has created an instance of this type:
myStructureType myStructure;
The compiler will allocate memory to myStructure as following:
If the programmer has decided to create an array of myStructureType, the compiler will repeat the previous pattern in memory by the number of array's elements as following:
If the compiler neglects the trailing padding bytes, the memory will become misaligned as following
32-bit modern processor will need two memory cycles and some masking operations to fetch B1 (the element B in the index "1" of the array of structures). However, an old processor will fire an alignment faults.
That's why the trailing padding was important in this case
struct queue_entry_s {
odp_buffer_hdr_t *head;
odp_buffer_hdr_t *tail;
int status;
enq_func_t enqueue ODP_ALIGNED_CACHE;
deq_func_t dequeue;
enq_multi_func_t enqueue_multi;
deq_multi_func_t dequeue_multi;
odp_queue_t handle;
odp_buffer_t sched_buf;
odp_queue_type_t type;
odp_queue_param_t param;
odp_pktio_t pktin;
odp_pktio_t pktout;
char name[ODP_QUEUE_NAME_LEN];
};
typedef union queue_entry_u {
struct queue_entry_s s;
uint8_t pad[ODP_CACHE_LINE_SIZE_ROUNDUP(sizeof(struct queue_entry_s))];
} queue_entry_t;
typedef struct queue_table_t {
queue_entry_t queue[ODP_CONFIG_QUEUES];
} queue_table_t;
static queue_table_t *queue_tbl;
#define ODP_CACHE_LINE_SIZE 64
#define ODP_ALIGN_ROUNDUP(x, align)\
((align) * (((x) + align - 1) / (align)))
#define ODP_CACHE_LINE_SIZE_ROUNDUP(x)\
ODP_ALIGN_ROUNDUP(x, ODP_CACHE_LINE_SIZE)
In the above code, typedef union queue_entry_u, What is the significance of the union. If we take structure(typedef struct queue_entry_u), Is there any disadvantage?
unions have several usages:
union saves some memory. It makes it so that s and pad sit in the same place in memory. It is useful if you know that only one of them is needed then you can use a union.
It is also useful to be able to iterate over the fields in your struct. By saving the fields in a union you have both an array and a struct so if you iterate over pad you are in essence iterating over the bytes of s.
unions are also useful in general for casting. The syntax is a little prettier to serialize your entry into a byte array by just using the union.
In this case it looks like the use of a union is to pad the size of s to fit in a cache line. This way if the size of a queue_entry_s is an exact multiple of the length of a cache line s then pad will sit in exactly the same memory and not waste space. Otherwise pad will take more memory than s and the size of the union will always be an exact multiple of the length of a cache line.
This being said it is usually only a good idea to use unions if you are writing embedded code for devices very low on memory or with very stringent performance requirements. They are very dangerous and very easy to misuse by accidentally writing over memory that was meant to represent the other type in the union.
Let's start with the definition of a union from K&R 2nd edition:
A union is a variable that may hold (at different times) objects of
different types [...]. Unions provide a way to manipulate different
kinds of data in a single area of storage.
The union in the question contains two objects: a structure of type struct queue_entry_s and a array of uint8_t. It's important to note that those two objects overlap in memory. Specifically, the address where the structure starts is the same as the address where the array starts. If you write to the structure, the contents of the array will be changed, and if you write to the array, then the contents of the structure will be changed.
Then note that the ODP_CACHE_LINE_SIZE_ROUNDUP macro takes a size and computes the smallest multiple of 64 that is greater than or equal to that size.
The size of the union is determined by the size of the largest member. So for example, if the sizeof(struct queue_entry_s) is 80, then the sizeof of the pad array will be 128, and the sizeof the union will be 128.
Which brings us finally to the answer. The purpose of the union is to increase the memory used by the structure, so that the structure always uses a multiple of 64 bytes of memory.
If you were to change typedef union queue_entry_u to typedef struct queue_entry_u, then the memory layout would be changed. Instead of a having s and pad overlapping in memory, the pad array would follow the s structure in memory. So if s occupies 80 bytes and pad occupies 128 bytes, then the typedef struct queue_entry_u would define an object that occupies 208 bytes of memory. That would be a waste of memory, and wouldn't comply with the multiple-of-64 requirement.
I have been reading data structure alignment articles but I'm getting nowhere. Perhaps things are just too complicated for me to understand. I also came across data structure padding which is also necessary to align data. How do I add a data structure padding to struct usb_ep? Also how do I make sure that whenever I perform kmalloc the data to be read should be at a memory offset which is some multiple of 4?
Regarding alignment, kmalloc will align the structures properly. If you have an 4byte variable, it will be 4bytes aligned, if you have an 8byte vaiable, it will be 8bytes aligned. Understanding alignment is the reason why padding is needed.
What you dont want to get is garbade padding between the variables in your struct. You can do that with the pragma pack directive (probably easiest) or by adding the padding manually.
Example
struct usb_ep
{
short a; /* 2 bytes*/
int b; /* 4 bytes*/
short c; /* 2 bytes*/
};
The size of all the elements is 8bytes, but due to alignment requirements, the size will be 12bytes. Memory layout would be like this:
short a - 2 bytes
char pad[2] - 2 bytes of padding
int b - 4 bytes
short c - 2 bytes
char pad[2] - 2 bytes of padding
In order to not get any padding, or increasing the size of the struct, you can rearrange elements in order to satisfy the alignment requirements.
That is having a struct:
struct usb_ep
{
short a; /* 2 bytes*/
short c; /* 2 bytes*/
int b; /* 4 bytes*/
};
Will have the size of 8bytes, and no requirement for adding padding.
This comes from http://minirighi.sourceforge.net/html/kmalloc_8c.html
void * kmemalign (size_t alignment, size_t size)
Allocate some memory aligned to a boundary.
Parameters:
alignment The boundary.
size The size you want to allocate.
Exceptions:
NULL Out-of-memory.
Returns:
A pointer to a memory area aligned to the boundary. The pointer is a aligned_mem_block_t pointer, so if you want to access to the data area of this pointer you must specify the p->start filed.
Note:
Use kfree(void *ptr) to free the allocated block.
The best way to pad fields in a structure is to declare your variables in descending size. So your largest ones first, then down to the smallest.
struct example {
double amount;
char *name;
int cnt;
char is_valid;
};
This doesn't always end up with logically connected items in the structure, but will typically give the most compact and easily accessible memory usage.
You can use use padding bytes in your struct declarations, but they clutter up the code, and do not guarantee compact structures. A compiler may align every byte on a 4 byte boundary, so you might end up with
struct example2 {
char a;
char padding1[3];
char b;
char padding2[3];
};
taking 4 bytes for a, 4 bytes for padding1, 4 bytes for b, and 4 bytes for padding2. Some compilers allow you to specify packed structures which would yield the correct result in this case. Usually I just declare the fields from largest to smallest types and leave it at that. If you need to share memory between two langages/compilers, then you need to make sure the structs align identically in memory.