Memory - Natural address boundary - c

Definition
Structure padding is the process of aligning data members of the structure in accordance with the memory alignment rules specified by the processor.
what is the memory alignment rule for Intel x86 processor?
As per my understanding, natural address boundaries for Intel-x86 processor is 32 bits each(i.e.,addressOffset%4==0)
So, In x86 processor,
struct mystruct_A {
char a;
int b;
char c;
};
will be constructed as,
struct mystruct_A {
char a;
char gap_0[3]; /* inserted by compiler: for alignment of b using array */
int b;
char c;
char gap_1[3]; /* for alignment of the whole struct using array */
};
what is the memory alignment rule for Intel x86-64 processor?
As per my understanding, natural address boundaries for Intel x86-64 processor is 64 bits each(i.e.,addressOffset%8==0)
So, In x86-64 processor,
struct mystruct_A {
char a;
int b;
char c;
};
will be constructed as,
struct mystruct_A {
char a;
char gap_0[7]; /* inserted by compiler: for alignment of b using array */
int b;
char c;
char gap_1[7]; /* for alignment of the whole struct using array */
};
If the above understanding is correct, then I would like to know why use an array of int for bit operation?
Recommends to use int sized data, as mentioned here, that says, because the most cost efficient access to memory is accessing int sized data.
Question:
Is this memory alignment rule that forces to declare int sized data for bit operations?

Addendum: this is valid for x86/-64 bit processors, but also for others. I am blindly assuming you're using those. For others, you should check the respective manuals.
If fasm automatically added fillers into my structs i'd go insane. In general, performance is better when accesses to memory are on a boundary corresponding to the size of the element you want to retrieve. That being said, it's not a definite necessity!
This article here might be worth a look: https://software.intel.com/en-us/articles/coding-for-performance-data-alignment-and-structures
Intel's suggestion for optimal layout is to start with the biggest elements first and going smaller as the structure increases. That way you'll stay aligned properly, as long as the first element is aligned properly. There are no three-byte elements, thus misalignment is out of the question and all the compiler might do is adding bytes at the end, which is the best way to make sure it won't ruin things if you choose to do direct memory accesses instead of using variables.
The safest procedure is to not rely on your compiler, but instead aligning the data properly yourself.
Fun Fact: loops work the same way. Padding NOPs in your code, before the start of a loop, can make a difference.

Related

Should I worry about structure alignment in C language when I write applications or it is done automatically?

I'm trying to understand if structure alignment could affect the programs that I write in C language. It seems that it is something that is handled automatically for us. The question is:
Should I worry about it when writing applications ?
If yes, then when exactly should I worry about it ?
Yes you should care about it since ignoring alignment might produce needlessly memory consuming code.
It's a hard requirement by the standard that struct members are to be allocated so that the first item is at the lowest address. The compiler isn't allowed to re-order them for memory optimization purposes. So if you write a bad struct like this (assuming 32/64 bit CPU):
typedef struct
{
char a;
/* 3 bytes padding here */
int b;
char c;
/* 3 bytes padding here */
} bloat_t;
// total size: 12
Since the CPU in this example has to read b 32 bit aligned, then the compiler has no choice but to insert 3 padding bytes between a and b. In addition to the 3 padding bytes it has to insert at the end no matter. So when I compile this struct on x86 Linux, I get size 12, which is a waste of space.
This can only be fixed by the programmer, who needs to be aware of such alignment requirements:
typedef struct
{
int b;
char a;
char c;
/* 2 bytes padding here */
} good_t;
// total size: 8
or alternatively
typedef struct
{
char a;
char c;
/* 2 bytes padding here */
int b;
} good_t;
// total size: 8
As other comments/answers have implied, the answer is "it depends."
If you're writing an app for a modern system (meaning ample memory), and your lack of size optimization will only result in the waste of a few K of memory, then no, it probably doesn't matter (though it may not be "best practices").
If, on the other hand, your application may create a huge array of these structures, and/or your struct has enormous inefficiencies, then it might eventually pose a problem.
If your struct is defining a header or a low-level message sequence (commonly found in embedded applications), than it most assuredly will matter, not only because of wasted space, but it can cause operations like memcpy() into the struct not to behave as you might expect. (I realize that such operations also are not "best practices" but they occur all the time in embedded applications, and defensive programming would suggest that you prevent this.) The only remedy for this that I can think of is to define the struct as packed.

How can size of a structure be a non-multiple of 4?

I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!
Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.
#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);
Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))

C malloc offsets relative to struct definition locations (and padding)

C question:
Does malloc'ing a struct always result in linear placement from top to bottom of the data inside? As a second minor question: is there a standard on the padding size, or does it vary between 32 and 64 bit machines?
Using this test code:
#include <stdio.h>
#include <stdlib.h>
struct test
{
char a;
/* char pad[3]; // I'm assuming this happens from the compiler */
int b;
};
int main() {
int size;
char* testarray;
struct test* testarraystruct;
size = sizeof(struct test);
testarray = malloc(size * 4);
testarraystruct = (struct test *)testarray;
testarraystruct[1].a = 123;
printf("test.a = %d\n", testarray[size]); // Will this always be test.a?
free(testarray);
return 0;
}
On my machine, size is always 8. Therefore I check testarray[8] to see if it's the second struct's 'char a' field. In this example, my machine returns 123, but this obviously isn't proof it always is.
Does the C compiler have any guarantees that struct's are created in linear order?
I am not claiming the way this is done is safe or the proper way, this is a curiosity question.
Does this change if this is becomes C++?
Better yet, would my memory look like this if it malloc'd to 0x00001000?
0x00001000 char a // First test struct
0x00001001 char pad[0]
0x00001002 char pad[1]
0x00001003 char pad[2]
0x00001004 int b // Next four belong to byte b
0x00001005
0x00001006
0x00001007
0x00001008 char a // Second test struct
0x00001009 char pad[0]
0x0000100a char pad[1]
0x0000100b char pad[2]
0x0000100c int b // Next four belong to byte b
0x0000100d
0x0000100e
0x0000100f
NOTE: This question assumes int's are 32 bits
As far as I know, malloc for struct is not linear placement of data but it's a linear allocation of memory for the members with in the structure that too when you create an object of it.
This is also necessary for padding.
Padding also depends on the type of machine (i.e 32 bit or 64 bit).
The CPU fetches the memory based on whether it is 32 bit or 64 bit.
For 32 bit machine your structure will be:
struct test
{
char a; /* 3 bytes padding done to a */
int b;
};
Here your CPU fetch cycle is 32 bit i.e 4 bytes
So in this case (for this example) the CPU takes two fetch cycles.
To make it more clear in one fetch cycle CPU allocates 4 bytes of memory. So 3 bytes of padding will be done to "char a".
For 64 bit machine your structure will be:
struct test
{
char a;
int b; /* 3 bytes padding done to b */
}
Here the CPU fetch cycle is 8 bytes.
So in this case (for this example) the CPU takes one fetch cycles. So 3 bytes of padding here must be done to "int b".
However you can avoid the padding you can use #pragma pack 1
But this will not be efficient w.r.t time because here CPU fetch cycles will be more (for this example CPU fetch cycles will be 5).
This is tradeoff between CPU fetch cycles and padding.
For many CPU types, it is most efficient to read an N-byte quantity (where N is a power of 2 — 1, 2, 4, 8, sometimes 16) when it is aligned on an N-byte address boundary. Some CPU types will generate a SIGBUS error if you try to read an incorrectly aligned quantity; others will make extra memory accesses as necessary to retrieve an incorrectly aligned quantity. AFAICR, the DEC Alpha (subsequently Compaq and HP) had a mechanism that effectively used a system call to fix up a misaligned memory access, which was fiendishly expensive. You could control whether that was allowed with a program (uac — unaligned access control) which would stop the kernel from aborting the process and would do the necessary double reads.
C compilers are aware of the benefits and costs of accessing misaligned data, and go to lengths to avoid doing so. In particular, they ensure that data within a structure, or an array of structures, is appropriately aligned for fast access unless you hold them to ransom with quasi-standard #pragma directives like #pragma pack(1).
For your sample data structure, for a machine where sizeof(int) == 4 (most 32-bit and 64-bit systems), then there will indeed be 3 bytes of padding after an initial 1 byte char field and before a 4-byte int. If you use short s; after the single character, there would be just 1 byte of padding. Indeed, the following 3 structures are all the same size on many machines:
struct test_1
{
char a;
/* char pad[3]; // I'm assuming this happens from the compiler */
int b;
};
struct test_2
{
char a;
short s;
int b;
};
struct test_3
{
char a;
char c;
short s;
int b;
};
The C standard mandates that the elements of a structure are laid out in the sequence in which they are defined. That is, in struct test_3, the element a comes first, then c, then s, then b. That is, a is at the lowest address (and the standard mandates that there is no padding before the first element), then c is at an address higher than a (but the standard does not mandate that it will be one byte higher), then s is at an address higher than c, and that b is at an address higher than s. Further, the elements cannot overlap. There may be padding after the last element of a structure. For example in struct test_4, on many computers, there will be 7 bytes of padding between a and d, and there will be 7 bytes of padding after b:
struct test_4
{
char a;
double d;
char b;
};
This ensures that every element of an array of struct test_4 will have the d member properly aligned on an 8-byte boundary for optimal access (but at the cost of space; the size of the structure is often 24 bytes).
As noted in the first comment to the question, the layout and alignment of the structure is independent of whether the space is allocated by malloc() or on the stack or in global variables. Note that malloc() does not know what the pointer it returns will be used for. Its job is simply to ensure that no matter what the pointer is used for, there will be no misaligned access. That often means the pointer returned by malloc() will fall on an 8-byte boundary; on some 64-bit systems, the address is always a multiple of 16 bytes. That means that consecutive malloc() calls each allocating 1 byte will seldom produce addresses 1 byte apart.
For your sample code, I believe that standard does require that testdata[size] does equal 123 after the assignment. At the very least, you would be hard-pressed to find a compiler where it is not the case.
For simple structures containing plain old data (POD — simple C data types), C++ provides the same layout as C. If the structure is a class with virtual functions, etc, then the layout rules depend on the compiler. Virtual bases and the dreaded 'diamond of death' multiple inheritance, etc, also make changes to the layout of structures.

Structure alignment issue

My Compiler require me to have a memory aligned structure declaration, to ensure proper data access.
I have a top structure, which comprised of some other structures. Is it sufficient to ensure that the top structure to be aligned to 32 byte boundary or I need to ensure that each structure should be aligned to 32 byte boundary.
Code snippet is below:-
typedef struct {
int p;
int q;
char n;
} L;
typedef struct {
int c;
int d;
char e;
L X2[13];
} B;
typedef struct {
int a;
int b;
B X1[10];
} M;
To ensure correct data access, Do I need to ensure that all structures are memory aligned properly, or padding the top most structure will ensure memory alignment.
Sometimes your application may require specific layout, but if as you say in this case it is a requirement of your compiler (or probably more accurately the target architecture of your compiler), then it is the compiler's responsibility to ensure those requirements are met.
If you require alignment other than that which the compiler will enforce naturally as required by the target, you will need compiler specific directives for packing and alignment; however applying such directives and getting it wrong is far more likley to cause an alignment fault than letting the compiler handle it. If you attempt to align by adding your own padding members, it may work, but is unnecessary and the compiler may insert additional padding of its own too.
The point is the compiler will not generate a structure with members it cannot safely and efficiently address. It will insert any necessary padding between members to ensure that subsequent members are addressable.
If you don't believe it will work, get your linker to output a map file (if it does not do so already) and check the address of these symbols to verify correct alignment. Also look at the generated size of the structures; you may find that some of them are larger than the sum of their parts - that is the compiler forcing alignment by inserting padding.
If you can use it, C11 has alignment statements if your application (or architecture, or performance) needs it : http://en.wikipedia.org/wiki/C11_%28C_standard_revision%29
Alignment specification (_Alignas specifier, alignof operator, aligned_alloc function, header file)
GCC surely has extensions also.
If you want to use natural alignment you have to order the
fields and pad manually to the natural word size. Portability
is not guaranteed but for a specific processor this can be done
(not uncommon in the embedded world). If sizeof(int) is 4
you have to add padding to your sub-structs to ensure alignment
in the arrays (I assume that your goal is to avoid "secret" padding
added by the compiler?). For example :
typedef struct {
int p;
int q;
char n;
char pad[3];
} L;
typedef struct {
int c;
int d;
char e;
char pad[3];
L X2[13];
} B;
typedef struct {
int a;
int b;
B X1[10];
} M;
would normally cause no "hidden" alignment in the structures.

How many bytes does char defined in a structure take?

When I define a character type in a structure, it seems to take more than 1 byte; in fact it seems to take 4 bytes.
Below is my program:
#include <stdio.h>
int main(void)
{
struct book{
char name;
float price;
int pages;
};
struct book b1={'B',130.00,550};
printf("\nAddress of structure:%u",&b1);
printf("\nAddress of character name:%u",&b1.name);
printf("\nAddress of float price:%u",&b1.price);
printf("\nAddress of integer pages:%u",&b1.pages);
printf("\n\n");
return 0;
}
When I run the above program, I get the output below:
Address of structure:557762432
Address of character name:557762432
Address of float price:557762436
Address of integer pages:557762440
Why is it that I see the difference of 4 bytes between address of variable "name" and variable "price"?
The system on which this program is being run is x86_64 bit arch running Fedora-14.
The C standard allows implementations to add additional padding bits to structures so as to access the structure faster by making it aligned to byte boundaries as required by that implementation.
This is known as Structure Padding.
Given the above the size of a structure may not be the same as the sum of the sizes of the individual members. You should always use sizeof to determine the size of structure.
Also, the above mentioned is the reason that you do not see the structures members placed at memory addresses you expect them to be at.
You are running into alignment rules. This is a very compiler- and system-specific thing, but by default (that is, unless you specify otherwise using compiler flags or special alignment requests in the code) GCC on x64 Linux aligns each field of a struct on a multiple of its size. So, a one-character byte has nothing in particular to worry about for alignment. However, an int or float is always placed on a 4-byte boundary, and a double is always placed on an 8-byte boundary. That's what you are seeing here.
As Ethan notes above in the comments, some processors won't even access memory objects that aren't aligned a certain way, and Intel processors will access memory much more slowly if it isn't aligned.
Strictly, the char always occupies one byte, but it may be followed by 0 or more bytes padding. How much padding depends on the compiler, and on the CPU, and on the alignment of the type following the char in a structure, and any packing pragmas that are in effect. If an N-byte type must be N-byte aligned, as on SPARC, for example, then:
struct s16 { char c; double d; };
struct s8 { char c; long l; };
struct s4 { char c; short s; };
struct s2 { char c; char b; };
The solitary char is followed by 7, 3, 1 and 0 bytes in the structures above on a SPARC machine, and also on an x86_64 machine.
A "char" in C typically takes exactly 8 bits (one byte).
However, a char in a struct is "aligned" on a word, dword or even quadword boundary:
http://en.wikipedia.org/wiki/Data_structure_alignment
There are compiler directives available to force alignment to 4, 2 or 1 byte (which circumvents this behavior).
For example, in Visual C++, you can say "#pragma pack(1)".

Resources