Why does Clang-Tidy suggest a larger alignment? - c

Given the following c language struct definition:
typedef struct PackTest {
long long a;
int b;
int c;
} PackTest;
Clang-Tidy gives the following message:
Accessing fields in struct 'PackTest' is inefficient due to poor alignment; currently aligned to 8 bytes, but recommended alignment is 16 bytes
I know why the struct is aligned to 8 bytes, but I don't know if the suggestion is valid and why.

Some particular specialized assembly instructions might have alignment requirements (for example, x86 non-scalar SSE instructions strictly require alignment to 16 bytes boundaries). Other instructions might have lower throughput when used on data that is not aligned to 16 byte boundaries (for example, x86 SSE2).
These kind of instructions are usually used to perform aggressive optimizations based on the hardware features of the processor. Overall, the message you get is only useful in those scenarios (i.e. if you are actually planning to take advantage of such instructions).
See also:
What does alignment to 16-byte boundary mean in x86
Why and where align 16 is used for SSE alignment for instructions?
Finally I'll just quote Rich from the above comment since they make a really good point:
There is nothing "untidy" about having standard structs that are not ridiculously over-aligned. For very specialized purposes you might want an over-aligned object, but if it's flagging this then most things it's flagging are just wrong, and encouraging you to write code that's inefficient and gratuitously nonstandard.

you can add -altera-struct-pack-align for Clang-Tidy to disable this warning
source: https://www.mail-archive.com/cfe-commits#lists.llvm.org/msg171275.html

Related

Where can I find what the alignment requirement for any arbitrary compiler? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
I came across this page The Lost Art of C Structure Packing and while I have never had to actually pad any structs, I'd like to learn a bit more so that when/if I need too - I can.
It says:
Storage for the basic C datatypes on an x86 or ARM processor doesn’t normally start at arbitrary byte addresses in memory. Rather, each type except char has an alignment requirement; chars can start on any byte address, but 2-byte shorts must start on an even address, 4-byte ints or floats must start on an address divisible by 4, and 8-byte longs or doubles must start on an address divisible by 8. Signed or unsigned makes no difference.
Does this imply that all 32 bit processors (x86, ARM, AVR32, PIC32,...) have this alignment requirement? What about 16 bit processors?
If not, and it is device specific, where can I find this information?
I tried searching through Microchip XC16 Manual but I could not find the alignment requirements that say that ints start at addresses divisible by 4.
I assume that the information is there, and I am not searching for the right key words - what is the "alignment requirement" called if I were to search online for more information?
Alignments requirements have 2 considerations: required, preferred
Required: Example: some platforms require various types, like an int to be aligned. Contorted code that attempts to access an int on an unaligned boundary results in a fault. Compilers will normally aligned data automatically to prevent this issue.
Efficiency: Unaligned accesses may be allowed yet results in slower code. Many compilers, rather than packing the data, will default to aligned data for speed efficiency. Typically such compilers allow a compiler specific keyword or compiler option to pack the data instead for space efficiency.
These issues apply to various processors of various sizes in different degrees. An 8-bit processor may have a 16-bit data bus and oblige 16+ -bit types to be aligned. A compliant C compiler for a 64-bit processor may have only have 64-bit types, even char. The possibilities are vast.
C provides an integer type max_align_t in <stddef.h>. This could be used in various ways to determine the minimum general alignment requirement.
... max_align_t which is an object type whose alignment is as great as is supported by the implementation in all contexts; ... C11 §7.19 2
C also has _Alignas() to impose stricter alignment of a variable.
There are two global answers here. Yes, all processors have an alignment penalty of some sort (ARM, MIPS, x86, etc). No you cannot determine by type. All ARMs do not have the same alignment penalty, despite what folks think they know about the older ARMv4 and ARMv5, you could do unaligned accesses in a predictable way, that predictable way was not what most of us would have preferred, and you have to enable it. MIPS and ARMs and perhaps others at one point would have a severe punishment for unaligned transfers, you would get a data fault. But due to the nature of how programmers program, etc, the default at least for ARM is to have that disabled on some/newer cores. You can disable it or enable it whichever way you want.
ALL processors have a penalty for unaligned transfers, a performance penalty, and those hits happen at the various layers, sometimes in the core, at the edge of the core, on each cache layer, and at the outer layer of ram. Since the designs vary so widely you cannot come up with a single rule.
Likewise since alignment in compilers is implementation defined, you cant write portable code. So if you are dealing with a processor (likely an ARM since that is where most folks get bitten) that has unaligned faults enabled, the most portable solution, but not foolproof, is to start your structs with the 64 bit variables, then the 32 then the 16 then the 8. Compilers tend to place things in the order that you defined them, so long as the whole struct starts on the right boundary for that target, then the variables will fall into alignment properly, no padding required. There is no global solution to the problem other than dont use structs, or disable alignment checking and suffer the front end performance hits.
Note that the 32 bit arms we generally deal with today use a 64 bit AMBA/AXI bus not 32, they still can check all the alignments (16, 32, 64) for transfers if enabled, but the unaligned performance hits at least at the AMBA/AXI level dont hit you unless you cross the 64 bit aligned boundary. You may still have an extra cache line hit, although that is unlikely if you dont have an AMBA/AXI hit.

Is it unnecessary to store a double member of a structure at an address that is a multiple of 8?

Suppose that sizeof(int) and sizeof(double) are 4 and 8 respectively and that there is no preprocessor command such as #pragma pack before the following code or compiler options with the same function as #pragma pack used in the compiler command line
typedef struct
{
int n;
double d;
} T;
then how much is sizeof(T)?
I think that it depends on the width of the data bus between the CPU and RAM. If the width is 32 bits, sizeof(T) is 12. If the width is 64 bits, sizeof(T) is 16. On a computer with a 32-bit data bus, to transfer a 64-bit number from CPU to RAM or vice versa, CPU has to access the data bus twice, reading or writing 32 bits at a time, so there is no point in storing the member d of the structure T at an address that is a multiple of 8.
Do you agree?
(Sorry for my poor English)
then how much is sizeof(T)?
You are correct, this is highly dependent on the system, the compiler, and the optimization settings. Generally speaking, the compiler knows best, at least in theory, what alignment to pick for the 8-byte double member of the structure. Moreover, compiler's decision could be different when you ask it to optimize for a smaller memory footprint compared to when you ask it to optimize for the fastest speed.
Finally, there may be systems where reading eight bytes from addresses aligned at four-byte boundary but not at eight-byte boundary may carry no penalty at all. Again, your compiler is in the best position to know that fact, and avoid padding your struct unnecessarily.
The most important thing to remember about the alignment is that you should not assume a particular layout of your struct, even if you are not intended to port your product to a different platform, because a change as simple as adding an optimization flag to the makefile may be sufficient to invalidate your assumptions.

Data alignment: where can it be read off? can it be changed?

This is exert from a book about data alignment of primitive types in memory.
Microsoft Windows imposes a stronger alignment requirement—any primitive object of K bytes, for
K = 2, 4, or 8, must have an address that is a multiple of K. In particular, it requires that the address
of a double or a long long be a multiple of 8. This requirement enhances the memory performance at
the expense of some wasted space. The Linux convention, where 8-byte values are aligned on 4-byte
boundaries was probably good for the i386, back when memory was scarce and memory interfaces were
only 4 bytes wide. With modern processors, Microsoft’s alignment is a better design decision. Data type
long double, for which gcc generates IA32 code allocating 12 bytes (even though the actual data type
requires only 10 bytes) has a 4-byte alignment requirement with both Windows and Linux.
Questions are:
What imposes data alignment, OS or compiler?
Can I change it or it is fixed?
Generally speaking, it's the compiler that imposes the alignment. Whenever you declare a primitive type (eg. double), the compiler will automatically align it to 8 bytes on the stack.
Furthermore, memory allocations are also generally aligned to the largest primitive type so that you can safely do this:
double *ptr = (double*)malloc(size);
without having to worry about alignment.
Therefore, generally speaking, if you're programming with good habits, you won't have to worry about alignment. One way to get something misaligned is to do something like this:
char *ch_ptr = (char*)malloc(size);
double *d_ptr = (double*)(ch_ptr + 1);
There are some exceptions to this: When you start getting into SSE and vectorization, things get a bit messy because malloc no longer guarantees 16-byte alignment.
To override the alignment of something, MSVC has the declspec(align) modifier which will allow this. It's used to increase the alignment of something. Though I'm not sure if it lets you decrease the alignment of a primitive type. It says explicitly that you cannot decrease alignment with this modifier.
EDIT :
I found the documentation stating the alignment of malloc() on GCC:
The address of a block returned by malloc or realloc in the GNU system
is always a multiple of eight (or sixteen on 64-bit systems).
Source: http://www.gnu.org/s/hello/manual/libc/Aligned-Memory-Blocks.html
So yes, GCC now aligns to at least 8 bytes.
The x86 CPUs have pretty lax alignment requirements. Most of data can be stored and accessed at unaligned locations, possibly at the expense of degraded performance. Things become more complex when you start developing multiprocessor software as alignment becomes important for atomicity and observed order of events (writing this from memory, this may be not entirely correct).
Compilers can often be directed to align variables differently from the default alignment. There're compiler options for that and special compiler-specific keywords (e.g. #pragma pack and others).
The well-established OS APIs can't be changed, neither by the application programmer (the OS is already compiled), nor by the OS developers (unless, of course, they are OK with breaking compatibility).
So, you can change some things, but not everything.
I don't know where microsoft got its information from, but the results on
gcc (4.6.1 Target: x86_64-linux-gnu, standard mode, no flags except -Wall) are quite different:
#include <stdio.h>
struct lll {
long l;
long long ll;
};
struct lld {
long l;
long double ld;
};
struct lll lll1, lll2[2];
struct lld lld1, lld2[2];
int main(void)
{
printf("lll1=%u, lll2=%u\n"
, (unsigned) sizeof lll1
, (unsigned) sizeof lll2
);
printf("lld=%u, lld2=%u\n"
, (unsigned) sizeof lld1
, (unsigned) sizeof lld2
);
return 0;
}
Results:
./a.out
lll1=16, lll2=32
lld=32, lld2=64
This might be FUD (from the company that actually managed to put unaligned ints into the MBR ...). But it could also be a result of the author not being informed too well.
To answer the question: it is the hardware that imposes the alignment restrictions. The compiler only needs to implement them.

What is overalignment of execution regions and input sections?

I came across code similar to the following today and I am curious as to what is actually happening:
#pragma pack(1)
__align(2) static unsigned char multi_array[7][24] = { 0 };
__align(2) static unsigned char another_multi_array[7][24] = { 0 };
#pragma pack()
When searching for a reference to the __align keyword in the Keil compiler, I came across this:
Overalignment of execution regions and input sections There are situations when you want to overalign code and data sections... If you have access to the original source code, you can do this at compile time with the __align(n) keyword...
I do not understand what is meant by "overaligning code and data sections". Can someone help to clarify how this overalignment occurrs?
The compiler will naturally "align" data based on the needs of the system. For example, on a typical 32-bit system, a 32-bit integer should always be a single 4-byte word (as opposed to being partly in one word and partly on the next), so it will always start on a 4-byte-word boundary. (This mostly has to do with the instructions available on the processor. A system is very likely to have an instruction to load a single word from memory into a register, and much less likely to have a single instruction to load an arbitrary sequence of four adjacent bytes into a register.)
The compiler normally does this by introducing gaps in the data; for example, a struct with a char followed by a 32-bit int, on such a system, would require eight bytes: one byte for the char, three bytes of filler so the int is aligned right, and four bytes for the int itself.
To "overalign" the data is to request greater alignment than the compiler would naturally provide. For example, you might request that a 32-bit integer start on an 8-byte boundary, even on a system that uses 4-byte words. (One major reason to do this would be if you're aiming for byte-level interoperability with a system that uses 8-byte words: if you pass structs from one system to the other, you want the same gaps in both systems.)
Overalignment is when the data is aligned to more than its default alignment. For example, a 4-byte int usually has a default alignment of 4 bytes. (meaning the address will be divisible by 4)
The default alignment of a datatype is quite-often (but not always) the size of the datatype.
Overalignment allows you to increase this alignment to something greater than the default.
As for why you would want to do this:
One reason for this is to be able access the data with a larger datatype (that has a larger alignment).
For example:
char buffer[16];
int *ptr = (int*)&buffer;
ptr[0] = 1;
ptr[1] = 2;
By default, buffer will only be aligned to 1 byte. However, int requires a 4-byte alignment. If buffer isn't aligned to 4 bytes, you will get a misalignment exception. (AFAIK, ARM doesn't allow misaligned memory access... x86/64 usually does, but with performance penalty)
__align() will let you force the alignment higher to make it work:
__align(4) char buffer[16];
A similar situation appears when using SIMD instructions. You will be accessing smaller datatype with a large SIMD datatype - which will likely require a larger alignment.
By overalign, Keil mean nothing more complex than aligning an object to a larger alignment boundary than the data type requires.
See the documentation for __align: "You can only overalign. That is, you can make a two-byte object four-byte aligned but you cannot align a four-byte object at 2 bytes."
In the case of the linker, you can force an extra alignment onto sections within other binary modules using the ALIGNALL or OVERALIGN directives. This may be useful for performance reasons, but isn't a common scenario.

Alignment restrictions for malloc()/free()

Older K&R (2nd ed.) and other C-language texts I have read that discuss the implementation of a dynamic memory allocator in the style of malloc() and free() usually also mention, in passing, something about data type alignment restrictions. Apparently certain computer hardware architectures (CPU, registers, and memory access) restrict how you can store and address certain value types. For example, there may be a requirement that a 4 byte (long) integer must be stored beginning at addresses that are multiples of four.
What restrictions, if any, do major platforms (Intel & AMD, SPARC, Alpha) impose for memory allocation and memory access, or can I safely ignore aligning memory allocations on specific address boundaries?
Sparc, MIPS, Alpha, and most other "classical RISC" architectures only allow aligned accesses to memory, even today. An unaligned access will cause an exception, but some operating systems will handle the exception by copying from the desired address in software using smaller loads and stores. The application code won't know there was a problem, except that the performance will be very bad.
MIPS has special instructions (lwl and lwr) which can be used to access 32 bit quantities from unaligned addresses. Whenever the compiler can tell that the address is likely unaligned it will use this two instruction sequence instead of a normal lw instruction.
x86 can handle unaligned memory accesses in hardware without an exception, but there is still a performance hit of up to 3X compared to aligned accesses.
Ulrich Drepper wrote a comprehensive paper on this and other memory-related topics, What Every Programmer Should Know About Memory. It is a very long writeup, but filled with chewy goodness.
Alignment is still quite important today. Some processors (the 68k family jumps to mind) would throw an exception if you tried to access a word value on an odd boundary. Today, most processors will run two memory cycles to fetch an unaligned word, but this will definitely be slower than an aligned fetch. Some other processors won't even throw an exception, but will fetch an incorrect value from memory!
If for no other reason than performance, it is wise to try to follow your processor's alignment preferences. Usually, your compiler will take care of all the details, but if you're doing anything where you lay out the memory structure yourself, then it's worth considering.
You still need to be aware of alignment issues when laying out a class or struct in C(++). In these cases the compiler will do the right thing for you, but the overall size of the struct/class may be more wastefull than necessary
For example:
struct
{
char A;
int B;
char C;
int D;
};
Would have a size of 4 * 4 = 16 bytes (assume Windows on x86) whereas
struct
{
char A;
char C;
int B;
int D;
};
Would have a size of 4*3 = 12 bytes.
This is because the compiler enforces a 4 byte alignment for integers, but only 1 byte for chars.
In general pack member variables of the same size (type) together to minimize wasted space.
As Greg mentioned it is still important today (perhaps more so in some ways) and compilers usually take care of the alignment based on the target of the architecture. In managed environments, the JIT compiler can optimize the alignment based on the runtime architecture.
You may see pragma directives (in C/C++) that change the alignment. This should only be used when very specific alignment is required.
// For example, this changes the pack to 2 byte alignment.
#pragma pack(2)
Note that even on IA-32 and the AMD64, some of the SSE instructions/intrinsics require aligned data. These instructions will throw an exception if the data is unaligned, so at least you won't have to debug "wrong data" bugs. There are equivalent unaligned instructions as well, but like Denton says, they're are slower.
If you're using VC++, then besides the #pragma pack directives, you also have the __declspec(align) directives for precise alignment. VC++ documentation also mentions an __aligned_malloc function for specific alignment requirements.
As a rule of thumb, unless you are moving data across compilers/languages or are using the SSE instructions, you can probably ignore alignment issues.

Resources