Data alignment: where can it be read off? can it be changed? - c

This is exert from a book about data alignment of primitive types in memory.
Microsoft Windows imposes a stronger alignment requirement—any primitive object of K bytes, for
K = 2, 4, or 8, must have an address that is a multiple of K. In particular, it requires that the address
of a double or a long long be a multiple of 8. This requirement enhances the memory performance at
the expense of some wasted space. The Linux convention, where 8-byte values are aligned on 4-byte
boundaries was probably good for the i386, back when memory was scarce and memory interfaces were
only 4 bytes wide. With modern processors, Microsoft’s alignment is a better design decision. Data type
long double, for which gcc generates IA32 code allocating 12 bytes (even though the actual data type
requires only 10 bytes) has a 4-byte alignment requirement with both Windows and Linux.
Questions are:
What imposes data alignment, OS or compiler?
Can I change it or it is fixed?

Generally speaking, it's the compiler that imposes the alignment. Whenever you declare a primitive type (eg. double), the compiler will automatically align it to 8 bytes on the stack.
Furthermore, memory allocations are also generally aligned to the largest primitive type so that you can safely do this:
double *ptr = (double*)malloc(size);
without having to worry about alignment.
Therefore, generally speaking, if you're programming with good habits, you won't have to worry about alignment. One way to get something misaligned is to do something like this:
char *ch_ptr = (char*)malloc(size);
double *d_ptr = (double*)(ch_ptr + 1);
There are some exceptions to this: When you start getting into SSE and vectorization, things get a bit messy because malloc no longer guarantees 16-byte alignment.
To override the alignment of something, MSVC has the declspec(align) modifier which will allow this. It's used to increase the alignment of something. Though I'm not sure if it lets you decrease the alignment of a primitive type. It says explicitly that you cannot decrease alignment with this modifier.
EDIT :
I found the documentation stating the alignment of malloc() on GCC:
The address of a block returned by malloc or realloc in the GNU system
is always a multiple of eight (or sixteen on 64-bit systems).
Source: http://www.gnu.org/s/hello/manual/libc/Aligned-Memory-Blocks.html
So yes, GCC now aligns to at least 8 bytes.

The x86 CPUs have pretty lax alignment requirements. Most of data can be stored and accessed at unaligned locations, possibly at the expense of degraded performance. Things become more complex when you start developing multiprocessor software as alignment becomes important for atomicity and observed order of events (writing this from memory, this may be not entirely correct).
Compilers can often be directed to align variables differently from the default alignment. There're compiler options for that and special compiler-specific keywords (e.g. #pragma pack and others).
The well-established OS APIs can't be changed, neither by the application programmer (the OS is already compiled), nor by the OS developers (unless, of course, they are OK with breaking compatibility).
So, you can change some things, but not everything.

I don't know where microsoft got its information from, but the results on
gcc (4.6.1 Target: x86_64-linux-gnu, standard mode, no flags except -Wall) are quite different:
#include <stdio.h>
struct lll {
long l;
long long ll;
};
struct lld {
long l;
long double ld;
};
struct lll lll1, lll2[2];
struct lld lld1, lld2[2];
int main(void)
{
printf("lll1=%u, lll2=%u\n"
, (unsigned) sizeof lll1
, (unsigned) sizeof lll2
);
printf("lld=%u, lld2=%u\n"
, (unsigned) sizeof lld1
, (unsigned) sizeof lld2
);
return 0;
}
Results:
./a.out
lll1=16, lll2=32
lld=32, lld2=64
This might be FUD (from the company that actually managed to put unaligned ints into the MBR ...). But it could also be a result of the author not being informed too well.
To answer the question: it is the hardware that imposes the alignment restrictions. The compiler only needs to implement them.

Related

objects of which types can always be allocated at page boundary?

On Linux, I am interested to know if at page boundary, some object type can always be allocated. For which C types, is this always guaranteed? Please point me to standard/documentation from which the answer follows.
Clarifying code:
#include <sys/mman.h>
#include <string.h>
typedef char TYPE;
int main()
{
TYPE val;
void *mem;
mem = mmap(NULL, sizeof(val), PROT_READ | PROT_WRITE, MAP_SHARED | MAP_ANONYMOUS, -1, 0);
memcpy(mem, &val, sizeof(val));
/* is *(TYPE*)mem guaranteed to be the same as val ? */
}
The answer for char is yes, guaranteed.
For which other types is the answer yes, guaranteed?
Object alignment requirements are determined by the hardware architecture (processor being used), because it is the processor that may not be able to load data from unaligned addresses. (In some cases, the kernel can provide unaligned address support by trapping the processor interrupt generated by an access attempt via an unaligned address, and emulate the instruction. This is slow, however.)
The hardware architectures supported by the Linux kernel are listed at https://www.kernel.org/doc/html/latest/arch.html.
We can summarize these by saying that there is no hardware architecture supported by Linux that requires more than 128 bytes of alignment for any native data type supported by the processor, and page sizes are a multiple of 512 bytes (for historical reasons), so we can categorically say that on Linux, all data primitives can be accessed at page-aligned addresses.
(In Linux, you can use sysconf(_SC_PAGESIZE) to obtain the page size. Note that if huge pages are supported, they are larger, a multiple of this value.)
The above covers all C data types defined by the GNU C standard library and its extensions, because they only define scalar types, structures with elements aligned at natural boundaries (not packed), and vectorized types (using GCC/Clang vector extensions designed for SIMD architectures).
(You can define a packed structure that has to be allocated at a non-aligned address if you are really evil, using GCC type or variable attributes.)
If we look at the C standard, the type max_align_t (provided in <stddef.h> by the compiler; it is available even in freestanding environments where the standard C library is not available) has the maximum alignment needed for any object (except for evilly constructed packed structures mentioned above).
This means that _Alignof (max_align_t) tells you the maximum alignment required for the types the C standard defines. (Do remember to tell your C compiler the code uses features provided by the C11 standard or later, e.g. -std=c11 or -std=gnu2x.)
However, certain architectures with SIMD (single instruction, multiple data –– for which GCC and other C compilers have added vector extensions, for example implementing the Intel <immintrin.h> MMX/SSE/AVX etc. intrinsics), may require larger alignment for vector registers, up to the size of the vector registers. (This is where that 512 bits comes from.) On x86-64 (64-bit Intel and AMD architectures currently used) there are separate instructions for unaligned and aligned accesses, with unaligned accesses possibly slower than aligned accesses, depending on the exact processor. So, _Alignof(max_align_t) does not apply to these vectorized types, using vector extensions to the C standard. The C standard itself refers to such types as "requiring extended alignment".
In Linux, all types passed to the kernel, including pointers, must retain their information when cast to long, because the Linux kernel syscall interface passes syscall arguments as an array of up to six longs. See include/asm-generic/syscall.h:syscall_get_arguments() in the Linux kernel. (While this function is implemented for each hardware architecture separately, every implementation has the same signature, ie. passes the syscall arguments as long.)
The C standard does not define any relationship between the pointer address, and the value of a pointer when converted to a sufficiently large integer type. This is because there historically were architectures where this relationship was complicated (for example, on 8086 'far' pointers were 32-bit, where the actual address was 16*high16bits + low16bits). In Linux, however, the relationship is expected to be 1:1. This can be seen in things like /proc pseudo-filesystem, where pointers (say, in /proc/self/maps) are displayed in hexadecimal. See lib/vsprintf.c:pointer_string(), which is used to convert pointers to strings for userspace ABIs: it casts the pointer to unsigned long int, and prints the number value.
This means that when pointer ptr is N-byte aligned, in Linux, (unsigned long)ptr % N == 0.
While the C standard leaves signed integer overflow for each implementation to define, the Linux kernel expects and uses the GCC behaviour: signed integers use two's complement, and wrap around analogously to unsigned integers. This means that casts between long and unsigned long do not affect the storage representation and lose no information; the types only differ in whether the value they represent is considered signed or unsigned. Thus, any of the logic above wrt. long equally applies to unsigned long, and vice versa.
Finally, you can variants paraphrasing the statement "on all currently supported architectures, ints are assumed to be 32 bits, longs the size of a pointer and long long 64 bits" in both the kernel sources and on the Linux Kernel Mailing List, LKML. See e.g. Christoph Hellwig in 2003. The patch (to documentation on adding syscalls) to explicitly mention that Linux currently only supports ILP32 and LP64 models was submitted in April 2021 with positive reactions, but hasn't been applied yet.

Why does Clang-Tidy suggest a larger alignment?

Given the following c language struct definition:
typedef struct PackTest {
long long a;
int b;
int c;
} PackTest;
Clang-Tidy gives the following message:
Accessing fields in struct 'PackTest' is inefficient due to poor alignment; currently aligned to 8 bytes, but recommended alignment is 16 bytes
I know why the struct is aligned to 8 bytes, but I don't know if the suggestion is valid and why.
Some particular specialized assembly instructions might have alignment requirements (for example, x86 non-scalar SSE instructions strictly require alignment to 16 bytes boundaries). Other instructions might have lower throughput when used on data that is not aligned to 16 byte boundaries (for example, x86 SSE2).
These kind of instructions are usually used to perform aggressive optimizations based on the hardware features of the processor. Overall, the message you get is only useful in those scenarios (i.e. if you are actually planning to take advantage of such instructions).
See also:
What does alignment to 16-byte boundary mean in x86
Why and where align 16 is used for SSE alignment for instructions?
Finally I'll just quote Rich from the above comment since they make a really good point:
There is nothing "untidy" about having standard structs that are not ridiculously over-aligned. For very specialized purposes you might want an over-aligned object, but if it's flagging this then most things it's flagging are just wrong, and encouraging you to write code that's inefficient and gratuitously nonstandard.
you can add -altera-struct-pack-align for Clang-Tidy to disable this warning
source: https://www.mail-archive.com/cfe-commits#lists.llvm.org/msg171275.html

C Pointer Sizes [duplicate]

By conducting a basic test by running a simple C++ program on a normal desktop PC it seems plausible to suppose that sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits ?
For example: in 32 bits architectures -> 4 bytes and in 64 bits architectures -> 8 bytes.
However I remember reading that, it is not like that in general!
So I was wondering what would be such circumstances?
For equality of size of pointers to data types compared with size of pointers
to other data types
For equality of size of pointers to data types compared with size of pointers
to functions
For equality of size of pointers to target architecture
No, it is not reasonable to assume. Making this assumption can cause bugs.
The sizes of pointers (and of integer types) in C or C++ are ultimately determined by the C or C++ implementation. Normal C or C++ implementations are heavily influenced by the architectures and the operating systems they target, but they may choose the sizes of their types for reasons other than execution speed, such as goals of supporting lower memory use (smaller pointers means less memory used in programs with lots of pointers), supporting code that was not written to be fully portable to any type sizes, or supporting easier use of big integers.
I have seen a compiler targeted for a 64-bit system but providing 32-bit pointers, for the purpose of building programs with smaller memory use. (It had been observed that the sizes of pointers were a considerable factor in memory consumption, due to the use of many structures with many connections and references using pointers.) Source code written with the assumption that the pointer size equalled the 64-bit register size would break.
It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
Depends. If you're aiming for a quick estimate of memory consumption it can be good enough. But not if your programs correctness depends on it.
(including pointers to functions)
But here is one important remark. Although most pointers will have the same size, function pointers may differ. It is not guaranteed that a void* will be able to hold a function pointer. At least, this is true for C. I don't know about C++.
So I was wondering what would be such circumstances if any?
It can be tons of reasons why it differs. If your programs correctness depends on this size it is NEVER ok to do such an assumption. Check it up instead. It shouldn't be hard at all.
You can use this macro to check such things at compile time in C:
#include <assert.h>
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
When compiling, this gives an error message:
$ gcc main.c
In file included from main.c:1:
main.c:2:1: error: static assertion failed: "Pointers are assumed to be exactly 4 bytes"
static_assert(sizeof(void*) == 4, "Pointers are assumed to be exactly 4 bytes");
^~~~~~~~~~~~~
If you're using C++, you can skip #include <assert.h> because static_assert is a keyword in C++. (And you can use the keyword _Static_assert in C, but it looks ugly, so use the include and the macro instead.)
Since these two lines are so extremely easy to include in your code, there's NO excuse not to do so if your program would not work correctly with the wrong pointer size.
It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
It might be reasonable, but it isn't reliably correct. So I guess the answer is "no, except when you already know the answer is yes (and aren't worried about portability)".
Potentially:
systems can have different register sizes, and use different underlying widths for data and addressing: it's not apparent what "target architecture bits" even means for such a system, so you have to choose a specific ABI (and once you've done that you know the answer, for that ABI).
systems may support different pointer models, such as the old near, far and huge pointers; in that case you need to know what mode your code is being compiled in (and then you know the answer, for that mode)
systems may support different pointer sizes, such as the X32 ABI already mentioned, or either of the other popular 64-bit data models described here
Finally, there's no obvious benefit to this assumption, since you can just use sizeof(T) directly for whatever T you're interested in.
If you want to convert between integers and pointers, use intptr_t. If you want to store integers and pointers in the same space, just use a union.
Target architecture "bits" says about registers size. Ex. Intel 8051 is 8-bit and operates on 8-bit registers, but (external)RAM and (external)ROM is accessed with 16-bit values.
For correctness, you cannot assume anything. You have to check and be prepared to deal with weird situations.
As a general rule of thumb, it is a reasonable default assumption.
It's not universally true though. See the X32 ABI, for example, which uses 32bit pointers on 64bit architectures to save a bit of memory and cache footprint. Same for the ILP32 ABI on AArch64.
So, for guesstimating memory use, you can use your assumption and it will often be right.
It is reasonable to assume that in general sizes of pointers of any type (including pointers to functions) are equal to the target architecture bits?
If you look at all types of CPUs (including microcontrollers) currently being produced, I would say no.
Extreme counterexamples would be architectures where two different pointer sizes are used in the same program:
x86, 16-bit
In MS-DOS and 16-bit Windows, a "normal" program used both 16- and 32-bit pointers.
x86, 32-bit segmented
There were only a few, less known operating systems using this memory model.
Programs typically used both 32- and 48-bit pointers.
STM8A
This modern automotive 8-bit CPU uses 16- and 24-bit pointers. Both in the same program, of course.
AVR tiny series
RAM is addressed using 8-bit pointers, Flash is addressed using 16-bit pointers.
(However, AVR tiny cannot be programmed with C++, as far as I know.)
It's not correct, for example DOS pointers (16 bit) can be far (seg+ofs).
However, for the usual targets (Windows, OSX, Linux, Android, iOS) then it's correct. Because they all use the flat programming model which relies on paging.
In theory, you can also have systems which uses only the lower 32 bits when in x64. An example is a Windows executable linked without LARGEADDRESSAWARE. However this is to help the programmer avoid bugs when switching to x64. The pointers are truncated to 32 bits, but they are still 64 bit.
In x64 operating systems then this assumption is always true, because the flat mode is the only valid one. Long mode in CPU forces GDT entries to be 64 bit flat.
One also mentions a x32 ABI, I believe it is based on the same paging technology, forcing all pointers to be mapped to the lower 4gb. However this must be based to the same theory as in Windows. In x64 you can only have flat mode.
In 32 bit protected mode you could have pointers up to 48 bits. (Segmented mode). You can also have callgates. But, no operating system uses that mode.
Historically, on microcomputers and microcontrollers, pointers were often wider than general-purpose registers so that the CPU could address enough memory and still fit within the transistor budget. Most 8-bit CPUs (such as the 8080, Z80 or 6502) had 16-bit addresses.
Today, a mismatch is more likely to be because an app doesn’t need multiple gigabytes of data, so saving four bytes of memory on every pointer is a win.
Both C and C++ provide separate size_t, uintptr_t and off_t types, representing the largest possible object size (which might be smaller than the size of a pointer if the memory model is not flat), an integral type wide enough to hold a pointer, and a file offset (often wider than the largest object allowed in memory), respectively. A size_t (unsigned) or ptrdiff_t (signed) is the most portable way to get the native word size. Additionally, POSIX guarantees that the system compiler has some flag that means a long can hold any of these, but you cannot always assume so.
Generally pointers will be size 2 on a 16-bit system, 3 on a 24-bit system, 4 on a 32-bit system, and 8 on a 64-bit system. It depends on the ABI and C implementation. AMD has long and legacy modes, and there are differences between AMD64 and Intel64 for Assembly language programmers but these are hidden for higher level languages.
Any problems with C/C++ code is likely to be due to poor programming practices and ignoring compiler warnings. See: "20 issues of porting C++ code to the 64-bit platform".
See also: "Can pointers be of different sizes?" and LRiO's answer:
... you are asking about C++ and its compliant implementations, not some specific physical machine. I'd have to quote the entire standard in order to prove it, but the simple fact is that it makes no guarantees on the result of sizeof(T*) for any T, and (as a corollary) no guarantees that sizeof(T1*) == sizeof(T2*) for any T1 and T2).
Note: Where is answered by JeremyP, C99 section 6.3.2.3, subsection 8:
A pointer to a function of one type may be converted to a pointer to a function of another type and back again; the result shall compare equal to the original pointer. If a converted pointer is used to call a function whose type is not compatible with the pointed-to type, the behavior is undefined.
In GCC you can avoid incorrect assumptions by using built-in functions: "Object Size Checking Built-in Functions":
Built-in Function: size_t __builtin_object_size (const void * ptr, int type)
is a built-in construct that returns a constant number of bytes from ptr to the end of the object ptr pointer points to (if known at compile time). To determine the sizes of dynamically allocated objects the function relies on the allocation functions called to obtain the storage to be declared with the alloc_size attribute (see Common Function Attributes). __builtin_object_size never evaluates its arguments for side effects. If there are any side effects in them, it returns (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3. If there are multiple objects ptr can point to and all of them are known at compile time, the returned number is the maximum of remaining byte counts in those objects if type & 2 is 0 and minimum if nonzero. If it is not possible to determine which objects ptr points to at compile time, __builtin_object_size should return (size_t) -1 for type 0 or 1 and (size_t) 0 for type 2 or 3.

Is it unnecessary to store a double member of a structure at an address that is a multiple of 8?

Suppose that sizeof(int) and sizeof(double) are 4 and 8 respectively and that there is no preprocessor command such as #pragma pack before the following code or compiler options with the same function as #pragma pack used in the compiler command line
typedef struct
{
int n;
double d;
} T;
then how much is sizeof(T)?
I think that it depends on the width of the data bus between the CPU and RAM. If the width is 32 bits, sizeof(T) is 12. If the width is 64 bits, sizeof(T) is 16. On a computer with a 32-bit data bus, to transfer a 64-bit number from CPU to RAM or vice versa, CPU has to access the data bus twice, reading or writing 32 bits at a time, so there is no point in storing the member d of the structure T at an address that is a multiple of 8.
Do you agree?
(Sorry for my poor English)
then how much is sizeof(T)?
You are correct, this is highly dependent on the system, the compiler, and the optimization settings. Generally speaking, the compiler knows best, at least in theory, what alignment to pick for the 8-byte double member of the structure. Moreover, compiler's decision could be different when you ask it to optimize for a smaller memory footprint compared to when you ask it to optimize for the fastest speed.
Finally, there may be systems where reading eight bytes from addresses aligned at four-byte boundary but not at eight-byte boundary may carry no penalty at all. Again, your compiler is in the best position to know that fact, and avoid padding your struct unnecessarily.
The most important thing to remember about the alignment is that you should not assume a particular layout of your struct, even if you are not intended to port your product to a different platform, because a change as simple as adding an optimization flag to the makefile may be sufficient to invalidate your assumptions.

Alignment restrictions for malloc()/free()

Older K&R (2nd ed.) and other C-language texts I have read that discuss the implementation of a dynamic memory allocator in the style of malloc() and free() usually also mention, in passing, something about data type alignment restrictions. Apparently certain computer hardware architectures (CPU, registers, and memory access) restrict how you can store and address certain value types. For example, there may be a requirement that a 4 byte (long) integer must be stored beginning at addresses that are multiples of four.
What restrictions, if any, do major platforms (Intel & AMD, SPARC, Alpha) impose for memory allocation and memory access, or can I safely ignore aligning memory allocations on specific address boundaries?
Sparc, MIPS, Alpha, and most other "classical RISC" architectures only allow aligned accesses to memory, even today. An unaligned access will cause an exception, but some operating systems will handle the exception by copying from the desired address in software using smaller loads and stores. The application code won't know there was a problem, except that the performance will be very bad.
MIPS has special instructions (lwl and lwr) which can be used to access 32 bit quantities from unaligned addresses. Whenever the compiler can tell that the address is likely unaligned it will use this two instruction sequence instead of a normal lw instruction.
x86 can handle unaligned memory accesses in hardware without an exception, but there is still a performance hit of up to 3X compared to aligned accesses.
Ulrich Drepper wrote a comprehensive paper on this and other memory-related topics, What Every Programmer Should Know About Memory. It is a very long writeup, but filled with chewy goodness.
Alignment is still quite important today. Some processors (the 68k family jumps to mind) would throw an exception if you tried to access a word value on an odd boundary. Today, most processors will run two memory cycles to fetch an unaligned word, but this will definitely be slower than an aligned fetch. Some other processors won't even throw an exception, but will fetch an incorrect value from memory!
If for no other reason than performance, it is wise to try to follow your processor's alignment preferences. Usually, your compiler will take care of all the details, but if you're doing anything where you lay out the memory structure yourself, then it's worth considering.
You still need to be aware of alignment issues when laying out a class or struct in C(++). In these cases the compiler will do the right thing for you, but the overall size of the struct/class may be more wastefull than necessary
For example:
struct
{
char A;
int B;
char C;
int D;
};
Would have a size of 4 * 4 = 16 bytes (assume Windows on x86) whereas
struct
{
char A;
char C;
int B;
int D;
};
Would have a size of 4*3 = 12 bytes.
This is because the compiler enforces a 4 byte alignment for integers, but only 1 byte for chars.
In general pack member variables of the same size (type) together to minimize wasted space.
As Greg mentioned it is still important today (perhaps more so in some ways) and compilers usually take care of the alignment based on the target of the architecture. In managed environments, the JIT compiler can optimize the alignment based on the runtime architecture.
You may see pragma directives (in C/C++) that change the alignment. This should only be used when very specific alignment is required.
// For example, this changes the pack to 2 byte alignment.
#pragma pack(2)
Note that even on IA-32 and the AMD64, some of the SSE instructions/intrinsics require aligned data. These instructions will throw an exception if the data is unaligned, so at least you won't have to debug "wrong data" bugs. There are equivalent unaligned instructions as well, but like Denton says, they're are slower.
If you're using VC++, then besides the #pragma pack directives, you also have the __declspec(align) directives for precise alignment. VC++ documentation also mentions an __aligned_malloc function for specific alignment requirements.
As a rule of thumb, unless you are moving data across compilers/languages or are using the SSE instructions, you can probably ignore alignment issues.

Resources