structure padding - what is the purpose of natural alignment? [duplicate] - c

This question already has answers here:
Padding in structures in C
(5 answers)
Closed 8 years ago.
I was learning about structure padding and data alignment. I came about this point that all the elements of the structure in the memory should be in natural alignment. so for example if I have following structure declared:
struct align{
char c;
double d;
int s;
};
If I take a 32 bit architecture, then it fetches 4 bytes at a time.So keeping this point in mind,if I start padding I will get(my assumption):
1byte(char) + 3bytes(padding) + 8bytes(double) + 4bytes(int) ---------> 1
all these shall be fetched with minimum machine cycles.
But originally the following is happening:
1byte(char) + 7bytes(padding) + 8bytes(double) + 4bytes(int) ----------> 2
why is it that we need this natural alignment for double when we could save 4bits while going with method 1 (while fetching each element with same no. of machine cycles in both cases) ?

Natural alignment refers to the size of the variable, not the size of the processor register and/or data path. A floating point double is 8 bytes, and so its natural alignment is 8 bytes. To be more precise, the natural alignment is the smallest power of 2 that is large enough to hold the variable, that definition covers the case of "long double" or x86 extended precision which is a 10-byte variable and whose natural alignment is a multiple of 16 bytes. For x86 processors see the optimization manual and search for alignment, you will find this is a subject rich in detail and specifics vary by micro-architecture, even within the same processor family. In particular, section 3.6.4 Alignment says
For best performance, align data as follows:
Align 8-bit data at any address.
Align 16-bit data to be contained within an aligned 4-byte word.
Align 32-bit data so that its base address is a multiple of four.
Align 64-bit data so that its base address is a multiple of eight.
Align 80-bit data so that its base address is a multiple of sixteen.
Align 128-bit data so that its base address is a multiple of sixteen.
The Pentium 4 is a 32-bit processor, part of the IA-32 family, yet it has a 64-bit data path (Front Side Bus). There are 32-bit processors that have only 16-bit buses, see 32-bit computing historical perspective. Accessing a variable at an alignment other than its natural alignment may result in a performance penalty, or an alignment fault, depending on the processor, in some cases the setting of a control bit, the type of variable, the instruction used, etc.
The actual alignment is up to the compiler and the calling conventions. For structures the requirement is that the first member variable must be at offset 0 (zero) and variables must be allocated in the order they are declared, padding may be inserted between variables for alignment and after the last variable to pad the size of the structure. In 32-bit Windows the stack is only required to be 4-byte aligned, so the compiler would have to generate extra code to ensure 8-byte alignment of a double allocated on the stack.
In Agner Fog's Calling Conventions document you will find details on the alignment used in different operating systems and by different compilers. The stack has a 4-byte alignment in 32-bit Windows, which explains why you may have observed a floating point double aligned at a 4-byte but not 8-byte boundary when allocated on the stack - the compiler doesn't have a clue when a function gets called whether the stack will be 8-byte aligned or not. In table-2 of that document it shows the alignment of various data types allocated in static storage as implemented by various compilers, you will notice that in 32-bit Windows the only compiler that allows 4-byte alignment for double is the Borland compiler.
When allocating in a structure according to that document the Borland compiler allows double to be at any byte offset (which I find surprising).
Here's the text description in the document, copied here for reference
Table 3 shows the alignment in bytes of data members of structures
and classes. The compiler will insert unused bytes, as required,
between members to obtain this alignment. The compiler will also
insert unused bytes at the end of the structure so that the total size
of the structure is a multiple of the alignment of the element that
requires the highest alignment. Many compilers have options to change
the default alignments. Differences in structure member alignment will
cause incompatibility between different programs or modules accessing
the same data and when data are stored in binary files. The programmer
can avoid such compatibility problems by ordering the structure
members so that no unused bytes need to be inserted. Likewise, the
padding at the end of the structure may be specified explicitly by
inserting dummy members of the required size. The size of the virtual
table pointer, if any, must be taken into account (see chapter 11).
5 Stack alignment
The stack pointer must be aligned by the stack word
size at all times. Some systems require a higher alignment. The Gnu
compiler version 3.x and later for 32-bit Linux and Mac OS X makes the
stack pointer aligned by 16 at every function call instruction.
Consequently it can rely on ESP = 12 modulo 16 at every function
entry. This alignment is not consistently implemented. It is
specified in the Mac OS ABI, but nowhere else. The stack is not
aligned when compiling with option -Os or
-mpreferred-stack-boundary=2, but apparently the Gnu compiler erroneously relies on the stack being aligned by 16 despite these
options. The Intel compiler (v. 9.1.038) for 32 bit Linux does not
have the same alignment. (I have submitted bug reports to Gnu and
Intel about this in 2006. In 2009 Intel added a -falign-stack=
assume-16-byte option to ICC version 11.0 to fix the problem). The
stack is aligned by 4 in 32-bit Windows. The 64 bit systems keep the
stack aligned by 16. The stack word size is 8 bytes, but the stack
must be aligned by 16 before any call instruction. Consequently, the
value of the stack 10 pointer is always 8 modulo 16 at the entry of a
procedure. A procedure must subtract an odd multiple of 8 from the
stack pointer before any call instruction. A procedure can rely on
these rules when storing XMM data that require 16-byte alignment. This
applies to all 64 bit systems (Windows, Linux, BSD). Where at least
one function parameter of type __m256 is transferred on the stack,
Unix systems (32 and 64 bit) align the parameter by 32 and the called
function can rely on the stack being aligned by 32 before the call
(i.e. the stack pointer is 32 minus the word size modulo 32 at the
function entry). This does not apply if the parameter is transferred
in a register. Various methods for aligning the stack are described
in Intel's application note AP 589 "Software Conventions for
Streaming SIMD Extensions", "Data Alignment and Programming Issues
for the Streaming SIMD Extensions with the Intel® C/C++ Compiler", and
"IA-32 Intel ® Architecture Optimization Reference Manual".

Your comment is valid, and you'll probably get the result you are looking for if, instead of using a struct, you simply lay down the variables as part of the local stack inside a function. Something along these lines :
void alignTest()
{
char c;
double d;
int s;
printf("%x %x %x", (int)&c, (int)&d, (int)&s);
}
In this example, the compiler is free to make its optimal choices performance and memory wise. Heck, it can even re-order variables if it wishes. On this setup, I've already witnessed double on 4-bytes boundaries (not 8) using 32-bits compilers.
On the other hand, using a struct, you need to keep in mind that it is part of an interface contract. It's not just a matter of the compiler selecting whatever choice it feels better : if part of an API, this struct will be used by other programs, potentially using another compiler, or another version of the same compiler. It happens all the time : think DLL, wrapper from other languages (calling a C function from a Delphi or Python program) etc.
You can't have an interface element in a "random state", with different choices depending on compiler. In this case, the allocation rules regarding variables inside a struct are set in stone by the specification.
In this specification, variable order is always respected, and double are aligned on 8 bytes.

Related

Padding logic of 'double' struct members on 32-bits machines [duplicate]

This question already has answers here:
Double alignment
(4 answers)
Closed 1 year ago.
As per this link https://www.geeksforgeeks.org/structure-member-alignment-padding-and-data-packing/ , on a 32-bit machine where size of data bus = 4 bytes, 'double' type struct members start from addresses which are multiple of 8.
But even if they started from addresses which are multiples of 4, we'd need 2 loads to bring them from memory. So I don't get the reason for the stricter constraint for starting address being a multiple of 8.
I am absolutely no expert, so if I'm wrong I'd love to know more too, but one reason I've seen to force double alignment on 8 bytes, is because of the cpu cache. If doubles were put on 4 byte alignments, the cache may only get half of the double and force more reads. By forcing alignment of 8 bytes, it makes sure that a single cache line is used to read the whole double.
This question is similar, why is data structure alignment important for performance? and some of the answers given may explain this better than I can for you.
In the model the linked page presents, there is no reason to restrict the address of a double to a multiple of eight bytes. It gives the number of four-byte memory transfers as a reason for alignment, and eight bytes can be loaded in two transfers as long as they start on a four-byte-aligned address. There is no need for an eight-byte-aligned address. (It should come as no surprise that some web page on the Internet is not of high quality.)
However, there is no single definition of a “32-bit machine” or a “64-bit machine”. Processor and systems vary in several regards, including bus width (and hence basic memory transfer size), processor register width, virtual memory mapping features, instruction set. No single one of these makes a machine “32 bit” or “64 bit.”
A processor might require eight-byte-aligned addresses for a double simply because its instruction set encoding is designed not to have low bits for the address of a double. The “load double” instruction that loads a double into a floating-point register might not have any way of specifying the low three bits of an address in certain addressing forms; they are always taken to be zero.
Another issue could be the processor is largely a 32-bit processor, with 32-bit general registers, but has a 64-bit bus. Loads of 32-bit items to general registers only need to be four-byte aligned because the processor always loads some eight-byte-aligned 64 bits and then takes the high or low 32 bits. (Likely it also coalesces consecutive 32-bit load instructions when possible, so the full 64 bits are used.)
As another answer states, requiring eight-byte alignment for eight-byte objects prevents them from straddling cache lines or memory pages.

what is aligned attribute and what are the uses of it

I have following lines in the code
# define __align_(x) __attribute__((aligned(x)))
I can use it int i __align_; what difference does it makes like like
I am using aligned attribute as above or if I am just creating my variable like int i; does it differ in how variable get created in memory?
I can use it int i __align_; what difference does it makes like like
This will not work because the macro is defined to have a parameter, __align_(x). When it is used without a parameter, it will not be replaced, and the compiler will report a syntax error. Also, identifiers starting with __ are reserved for the C implementation (for the use of the compiler, the standard library, and any other parts forming the C implementation), so a regular program should not use such a name.
When you use the macro correctly, it changes the normal alignment requirement for the type.
Generally, objects of various types have alignment requirements: They should be located in memory at addresses that are multiples of their requirement. The reasons for this are because computer hardware is usually designed to work with groups of bytes, so it may fetch data from memory in groups of, for example, four bytes: Bytes from 0 to 3, bytes from 4 to 7, bytes from 8 to 11, and so on.
If a four-byte object with four-byte alignment requirement is located at a multiple of four bytes, then it can be read from memory easily, by loading the group of bytes it is in. It can also be written to memory easily.
If the object were not at a multiple of four bytes, it cannot be loaded as one group of bytes. It can be loaded by loading the two groups of bytes it straddles, extracting the desired bytes, and combining the desired bytes in one processor register. However, that takes more work, so we want to avoid it. The compiler is written to automatically align things as desired for the C implementation, and it writes load and store instructions that expect the desired alignment.1
Different object types can have different alignment requirements even though they are bound by the same hardware behavior. For example, with a two-byte short, the alignment requirement may be two bytes. This is because, whether it starts at byte 0 or byte 2 within a group (say at address 100, 102, 104, or 106), we can load the short by loading a single group of four bytes and taking just the two bytes we want. However, if it started at byte 3 (say at address 103), we would have to load two groups of bytes (100 to 103 and 104 to 107) to get the bytes we needed for the short (103 and 104). So two-byte alignment suffices for this short even though the hardware is designed with four-byte groups.
As mentioned, the compiler handles alignment automatically. When you define a structure with multiple members of different types, the compiler inserts padding so that each member is aligned correctly, and it inserts padding at the end of the structure so that an array of them keeps the alignment from element to element in the array.
There are times when we want to override the compiler’s automatic behavior. When we are preparing to send data over a network connection, the communication protocol might require the different fields of a message to be packed together in consecutive bytes, with no padding. In this case, we can define a structure with an alignment requirement of 1 byte for it and all its members. When we are ready to send a message, we could copy data into this structure’s members and then write the structure to the network device.
When you tell the compiler an object is not aligned normally, the compiler will generate instructions for that. Instead of the normal load or store instructions, it will use special unaligned load or store instructions if the computer architecture has them. If it does not, the compiler will use instructions to shift and store individual bytes or to shift and merge bytes and store them as aligned words, depending on what instructions are available in the computer architecture. This is generally inefficient; it will slow down your program. So it should not be used in normal programming. Decreasing the alignment requirements should be used only when there is a need for controlling the layout of data in memory.
Sometimes increasing the alignment requirements is used for performance. For example, an array of four-byte float elements generally only needs four-byte alignment. However, some computers have special instructions to process four float elements (16 bytes) at a time, and the benefit from having that data aligned to a multiple of 16 bytes. (And some computers have instructions for even more data at one time.) In this case, we might increase the alignment requirement for our float array (but not its individual elements) so that it is aligned to be good with these instructions.
Footnote
1 What happens if you force an object to be located at an undesired alignment without telling the compiler varies. In some computers, when a load instruction is executed with an unaligned address, the processor will “trap,” meaning it stops normal program execution and transfers control to the operating system, reporting an error in your program. In some computers, the processor will ignore the low bits of the address and load the wrong data. In some computers, the processor will load the two groups of bytes, extract the desired bytes, and merge them. On computers that trap, the operating system might do the manual fix-up of loading the bytes, or it might terminate your program or report the error to your program.
The attribute tells the compiler that the variable in question must be placed in memory in addresses that are aligned to a certain number of bytes (addr % alignement == 0).
This is important because the CPU can only work on some integer values if they are aligned - such as int32 must be 4 bytes aligned and int64 must be 8 bytes aligned, pointers need to be 4/8 (32/64 bit cpu) aligned too.
The attribute is mostly used for structures, where certain fields within the structure must be memory aligned in order to allow the CPU to do integer operations on them (like mov.l) without hitting a BUS ERROR from the memory controller.
If structures aren't properly aligned, the compiler will have to add extra instructions to first load the unaligned value into a register with several memory operations which is more expensive in performance.
It can also be used to bump performance in more performance sensitive systems by creating buffers that are page aligned (4k usually) so that paging will have less of an impact, or if you want to create DMA-able buffer zones - but that's a bit more advanced...

What is overalignment of execution regions and input sections?

I came across code similar to the following today and I am curious as to what is actually happening:
#pragma pack(1)
__align(2) static unsigned char multi_array[7][24] = { 0 };
__align(2) static unsigned char another_multi_array[7][24] = { 0 };
#pragma pack()
When searching for a reference to the __align keyword in the Keil compiler, I came across this:
Overalignment of execution regions and input sections There are situations when you want to overalign code and data sections... If you have access to the original source code, you can do this at compile time with the __align(n) keyword...
I do not understand what is meant by "overaligning code and data sections". Can someone help to clarify how this overalignment occurrs?
The compiler will naturally "align" data based on the needs of the system. For example, on a typical 32-bit system, a 32-bit integer should always be a single 4-byte word (as opposed to being partly in one word and partly on the next), so it will always start on a 4-byte-word boundary. (This mostly has to do with the instructions available on the processor. A system is very likely to have an instruction to load a single word from memory into a register, and much less likely to have a single instruction to load an arbitrary sequence of four adjacent bytes into a register.)
The compiler normally does this by introducing gaps in the data; for example, a struct with a char followed by a 32-bit int, on such a system, would require eight bytes: one byte for the char, three bytes of filler so the int is aligned right, and four bytes for the int itself.
To "overalign" the data is to request greater alignment than the compiler would naturally provide. For example, you might request that a 32-bit integer start on an 8-byte boundary, even on a system that uses 4-byte words. (One major reason to do this would be if you're aiming for byte-level interoperability with a system that uses 8-byte words: if you pass structs from one system to the other, you want the same gaps in both systems.)
Overalignment is when the data is aligned to more than its default alignment. For example, a 4-byte int usually has a default alignment of 4 bytes. (meaning the address will be divisible by 4)
The default alignment of a datatype is quite-often (but not always) the size of the datatype.
Overalignment allows you to increase this alignment to something greater than the default.
As for why you would want to do this:
One reason for this is to be able access the data with a larger datatype (that has a larger alignment).
For example:
char buffer[16];
int *ptr = (int*)&buffer;
ptr[0] = 1;
ptr[1] = 2;
By default, buffer will only be aligned to 1 byte. However, int requires a 4-byte alignment. If buffer isn't aligned to 4 bytes, you will get a misalignment exception. (AFAIK, ARM doesn't allow misaligned memory access... x86/64 usually does, but with performance penalty)
__align() will let you force the alignment higher to make it work:
__align(4) char buffer[16];
A similar situation appears when using SIMD instructions. You will be accessing smaller datatype with a large SIMD datatype - which will likely require a larger alignment.
By overalign, Keil mean nothing more complex than aligning an object to a larger alignment boundary than the data type requires.
See the documentation for __align: "You can only overalign. That is, you can make a two-byte object four-byte aligned but you cannot align a four-byte object at 2 bytes."
In the case of the linker, you can force an extra alignment onto sections within other binary modules using the ALIGNALL or OVERALIGN directives. This may be useful for performance reasons, but isn't a common scenario.

CPU and Data alignment

Pardon me if you feel this has been answered numerous times, but I need answers to the following queries!
Why data has to be aligned (on 2-byte / 4-byte / 8-byte boundaries)? Here my doubt is when the CPU has address lines Ax Ax-1 Ax-2 ... A2 A1 A0 then it is quite possible to address the memory locations sequentially. So why there is the need to align the data at specific boundaries?
How to find the alignment requirements when I am compiling my code and generating the executable?
If for e.g the data alignment is 4-byte boundary, does that mean each consecutive byte is located at modulo 4 offsets? My doubt is if data is 4-byte aligned does that mean that if a byte is at 1004 then the next byte is at 1008 (or at 1005)?
CPUs are word oriented, not byte oriented. In a simple CPU, memory is generally configured to return one word (32bits, 64bits, etc) per address strobe, where the bottom two (or more) address lines are generally don't-care bits.
Intel CPUs can perform accesses on non-word boundries for many instructions, however there is a performance penalty as internally the CPU performs two memory accesses and a math operation to load one word. If you are doing byte reads, no alignment applies.
Some CPUs (ARM, or Intel SSE instructions) require aligned memory and have undefined operation when doing unaligned accesses (or throw an exception). They save significant silicon space by not implementing the much more complicated load/store subsystem.
Alignment depends on the CPU word size (16, 32, 64bit) or in the case of SSE the SSE register size (128 bits).
For your last question, if you are loading a single data byte at a time there is no alignment restriction on most CPUs (some DSPs don't have byte level instructions, but its likely you won't run into one).
Very little data "has" to be aligned. It's more that certain types of data may perform better or certain cpu operations require a certain data alignment.
First of all, let's say you're reading 4 bytes of data at a time. Let's also say that your CPU has a 32 bit data buss. Let's also say your data is stored at byte 2 in the system memory.
Now since you can load 4 bytes of data at once, it doesn't make too much sense to have your Address register to point to a single byte. By making your address register point to every 4 bytes you can manipulate 4 times the data. So in other words your CPU may only be able to read data starting at bytes 0, 4, 8, 12, 16, etc.
So here's the issue. If you want the data starting at byte 2 and you're reading 4 bytes, then half your data will be in address position 0 and the other half in position 1.
So basically you'd end up hitting the memory twice to read your one 4 byte data element. Some CPUs don't support this sort of operation (or force you to load and combine the two results manually).
Go here for more details: http://en.wikipedia.org/wiki/Data_structure_alignment
1.) Some architectures do not have this requirement at all, some encourage alignment (there is a speed penalty when accessing non-alignet data items), and some may enforce it strictly (misaligment causes a processor exception).
Many of todays popular architectures fall in the speed penalty category. The CPU designers had to make a trade between flexibility/performance and cost (silicon area/number of control signals required for bus cycles).
2.) What language, which architecture? Consult your compilers manual and/or the CPU architecture documentation.
3.) Again this is totally architecture dependent (some architectures may not permit access on byte-sized items at all, or have bus widths which are not even a multiple of 8 bits). So unless you are asking about a specific architecture you wont get any useful answers.
In general, the one answer to all three of those questions is "it depends on your system". Some more details:
Your memory system might not be byte-addressable. Besides that, you might incur a performance penalty to have your processor access unaligned data. Some processors (like older ARM chips, for example) just can't do it at all.
Read the manual for your processor and whatever ABI specification your code is being generated for,
Usually when people refer to data being at a certain alignment, it refers only to the first byte. So if the ABI spec said "data structure X must be 4-byte aligned", it means that X should be placed in memory at an address that's divisible by 4. Nothing is implied by that statment about the size or internal layout of structure X.
As far as your particular example goes, if the data is 4-byte aligned starting at address 1004, the next byte will be at 1005.
Its completely depends on the CPU you are using!
Some architectures deal only in 32 (or 36!) bit words and you need special instructions to load singel characters or haalf words.
Some cpus (notably PowerPC and other IBM risc chips) dont care about alignments and will load integers from odd addresses.
For most modern architectures you need to align integers to word boundies and long integers to double word boundries. This simplifies the circutry for loading registers and speeds things up ever so slighly.
Data alignment is required by CPU for performance reason. Intel website give out the detail on how to align the data in the memory
Data Alignment when Migrating to 64-Bit Intel® Architecture
One of these is the alignment of data items – their location in memory in relation to addresses that are multiples of four, eight or 16 bytes. Under the 16-bit Intel architecture, data alignment had little effect on performance, and its use was entirely optional. Under IA-32, aligning data correctly can be an important optimization, although its use is still optional with a very few exceptions, where correct alignment is mandatory. The 64-bit environment, however, imposes more-stringent requirements on data items. Misaligned objects cause program exceptions. For an item to be aligned properly, it must fulfill the requirements imposed by 64-bit Intel architecture (discussed shortly), plus those of the linker used to build the application.
The fundamental rule of data alignment is that the safest (and most widely supported) approach relies on what Intel terms "the natural boundaries." Those are the ones that occur when you round up the size of a data item to the next largest size of two, four, eight or 16 bytes. For example, a 10-byte float should be aligned on a 16-byte address, whereas 64-bit integers should be aligned to an eight-byte address. Because this is a 64-bit architecture, pointer sizes are all eight bytes wide, and so they too should align on eight-byte boundaries.
It is recommended that all structures larger than 16 bytes align on 16-byte boundaries. In general, for the best performance, align data as follows:
Align 8-bit data at any address
Align 16-bit data to be contained within an aligned four-byte word
Align 32-bit data so that its base address is a multiple of four
Align 64-bit data so that its base address is a multiple of eight
Align 80-bit data so that its base address is a multiple of sixteen
Align 128-bit data so that its base address is a multiple of sixteen
A 64-byte or greater data structure or array should be aligned so that its base address is a multiple of 64. Sorting data in decreasing size order is one heuristic for assisting with natural alignment. As long as 16-byte boundaries (and cache lines) are never crossed, natural alignment is not strictly necessary, although it is an easy way to enforce adherence to general alignment recommendations.
Aligning data correctly within structures can cause data bloat (due to the padding necessary to place fields correctly), so where necessary and possible, it is useful to reorganize structures so that fields that require the widest alignment are first in the structure. More on solving this problem appears in the article "Preparing Code for the IA-64 Architecture (Code Clean)."
For Intel Architecture, Chapter 4 DATA TYPES of Intel 64 and IA-32 Architectures Software Developer’s Manual answers your question 1.

Alignment restrictions for malloc()/free()

Older K&R (2nd ed.) and other C-language texts I have read that discuss the implementation of a dynamic memory allocator in the style of malloc() and free() usually also mention, in passing, something about data type alignment restrictions. Apparently certain computer hardware architectures (CPU, registers, and memory access) restrict how you can store and address certain value types. For example, there may be a requirement that a 4 byte (long) integer must be stored beginning at addresses that are multiples of four.
What restrictions, if any, do major platforms (Intel & AMD, SPARC, Alpha) impose for memory allocation and memory access, or can I safely ignore aligning memory allocations on specific address boundaries?
Sparc, MIPS, Alpha, and most other "classical RISC" architectures only allow aligned accesses to memory, even today. An unaligned access will cause an exception, but some operating systems will handle the exception by copying from the desired address in software using smaller loads and stores. The application code won't know there was a problem, except that the performance will be very bad.
MIPS has special instructions (lwl and lwr) which can be used to access 32 bit quantities from unaligned addresses. Whenever the compiler can tell that the address is likely unaligned it will use this two instruction sequence instead of a normal lw instruction.
x86 can handle unaligned memory accesses in hardware without an exception, but there is still a performance hit of up to 3X compared to aligned accesses.
Ulrich Drepper wrote a comprehensive paper on this and other memory-related topics, What Every Programmer Should Know About Memory. It is a very long writeup, but filled with chewy goodness.
Alignment is still quite important today. Some processors (the 68k family jumps to mind) would throw an exception if you tried to access a word value on an odd boundary. Today, most processors will run two memory cycles to fetch an unaligned word, but this will definitely be slower than an aligned fetch. Some other processors won't even throw an exception, but will fetch an incorrect value from memory!
If for no other reason than performance, it is wise to try to follow your processor's alignment preferences. Usually, your compiler will take care of all the details, but if you're doing anything where you lay out the memory structure yourself, then it's worth considering.
You still need to be aware of alignment issues when laying out a class or struct in C(++). In these cases the compiler will do the right thing for you, but the overall size of the struct/class may be more wastefull than necessary
For example:
struct
{
char A;
int B;
char C;
int D;
};
Would have a size of 4 * 4 = 16 bytes (assume Windows on x86) whereas
struct
{
char A;
char C;
int B;
int D;
};
Would have a size of 4*3 = 12 bytes.
This is because the compiler enforces a 4 byte alignment for integers, but only 1 byte for chars.
In general pack member variables of the same size (type) together to minimize wasted space.
As Greg mentioned it is still important today (perhaps more so in some ways) and compilers usually take care of the alignment based on the target of the architecture. In managed environments, the JIT compiler can optimize the alignment based on the runtime architecture.
You may see pragma directives (in C/C++) that change the alignment. This should only be used when very specific alignment is required.
// For example, this changes the pack to 2 byte alignment.
#pragma pack(2)
Note that even on IA-32 and the AMD64, some of the SSE instructions/intrinsics require aligned data. These instructions will throw an exception if the data is unaligned, so at least you won't have to debug "wrong data" bugs. There are equivalent unaligned instructions as well, but like Denton says, they're are slower.
If you're using VC++, then besides the #pragma pack directives, you also have the __declspec(align) directives for precise alignment. VC++ documentation also mentions an __aligned_malloc function for specific alignment requirements.
As a rule of thumb, unless you are moving data across compilers/languages or are using the SSE instructions, you can probably ignore alignment issues.

Resources