How to avoid structure padding? - c

I'm curious to know about how #pragma will help to avoid stucture padding (please give me one programme to understand it).
By default compilers will allocate memory in an aligned manner. So by avoiding structure padding what will be the benifit programmer will get?
When is it neccessary to avoid structure padding?

The techniques depend on compiler.
The benefits are usually not much, other than potentially reducing the amount of memory consumed by your program. That benefit is only worth while on machines with few resources (e.g. memory) which means it is rarely needed with modern hardware.
In practice, the cost is reduced performance or hardware exceptions. The common purposes of padding are performance and avoiding hardware exceptions, by aligning struct members in a way that suits the host system. Disallowing padding basically turns off all the benefits of padding.
Saving a few bytes, or even a few kilobytes, is rarely worth the impact in terms of performance or more error conditions. If you are doing certain types of development (e.g. on embedded system with limited resources) it might be worthwhile, but even then not always.

1.Structure Padding is avoided mostly in case of resource critical embedded systems.In this case RAM is saved by packing the structure members on the expense of code memory(More Instructions are needed to access the packed structure member).
In some cases an array of bytes is mapped to structure members for ease of accessing.

structure paddding can be avoided by using a preprocessor directive #pragma 1
the compiler will alocate memory in multiples of one

Related

What other compilers do I need to worry about struct packing?

In GCC, I need to use __attribute__((packed)) to make structs take the least amount of space, for example, if I have a large array of structs I should pack them. What other common compilers do struct padding, and how do I pack structs in these other compilers?
The premise of your question is mistaken. Packed structs are not something you're "supposed to use" to save space. They're a dubious nonstandard feature with lots of problems that should be avoided whenever possible. (Ultimately it's always possible to avoid them, but some people balk at the tradeoffs involved.) For example, whenever you use packed structures, any use of pointers to members is potentially unsafe because the pointer value is not necessarily a valid (properly aligned) pointer to the type it points to. The only time there is a "need" for packed structures is when you're using them to access memory-mapped hardware registers that are misaligned, or to access in-file/on-disk data structures that are misaligned (but the latter won't be portable anyway since representation/endianness may not match, and both problems are solved together much better with a proper serialization/deserialization function).
If your goal is to save space, then as long as you control the definition of the structure, simply order it so as not to leave unnecessary padding space. This can be achieved simply by ordering the members in order of decreasing size; if you do that, then on any reasonable implementation, the wasted space will be at most the difference between the size of the largest member and the size of the smallest.

Get memory granularity of a processor

How to get the memory granularity of a CPU in C?
Suppose I want to allocate an array where all the elements are properly memory aligned. I can pad each element to a certain size N to achieve this. How do I know the value of N?
Note: I am trying to create a memory pool where each slot is memory aligned. Any suggestion will be appreciated.
In Theory
How to get the memory granularity of a CPU in C?
First, you read the instruction set architecture manual. It may specify that certain instructions require certain alignments, or even that the addressing forms in certain instructions cannot represent non-aligned addresses. It may specify other properties regarding alignment.
Second, you read the processor manual. It may specify performance characteristics (such as that unaligned loads or stores are supported but may be slower or use more resources than aligned loads or stores) and may specify various options allowed by the instructions set architecture.
Third, you read the operating system documentation. Some architectures allow the operating system to select features related to alignment, such as whether unaligned loads and stores are made to fail or are supported albeit with slower performance than aligned loads or stores. The operating system documentation should have this information.
In Practice
For many programming situations, what you need to know is not the “memory granularity” of a CPU but the alignment requirements of the C implementation you are using (or of whatever language you are using). And, for the most part, you do not need to know the alignment requirements directly but just need to follow the language rules about managing objects—use objects with declared types, do not use casts to convert pointers between incompatible types exceed where specific rules allow it, use the suitably aligned memory as provided by malloc rather than adjusting your own pointers to bytes, and so on. Following these rules will give good alignment for the objects in your program.
In C, when you define an array, the element size will automatically be the size that C implementation needs for its alignment. For example, long double x[100]; may use 16 bytes for each array element even though the hardware uses only ten bytes for a long double. Or, for any struct foo that you define, the compiler will automatically include padding as needed in the structure to give the desired alignment, and any array struct foo x[100]; will already include that padding. sizeof(struct foo) will be the same as sizeof x[0], because each structure object has that padding built in, even just for a single structure object, not just for elements in arrays.
When you do need to know the alignment that a C implementation requires for a type, you can use C’s _Alignof operator. The expression _Alignof(type) provides the alignment required for type.
Other
… properly memory aligned.
Proper alignment is a matter of degrees:
What the processor supports may determine whether your program works or does not work. An improper alignment is one that causes your program to trap.
What is efficient with respect to individual loads and stores may affect how fast your program runs. An improper alignment is one that causes your program to execute more slowly.
In certain performance-critical situations, alignment with respect to cache and memory mapping features can also affect performance.
Short answer
Use 64 bytes.
Long answer
Data are loaded from and stored to memory in units called cache lines. If your program loads only part of the data in a cache line, then the whole line will be loaded into the CPU caches. Perhaps more importantly, the algorithm used for moving data between cores in a multi-core CPU operates on full cache lines; aligning your data to cache lines avoids false sharing, the situation where a cache line bounces between cores because it contains data manipulated by different threads.
It used to be the case that cache lines depended on the architecture, ranging from 16 up to 512 bytes. However, all current processors (Intel, AMD, ARM, MIPS) use a cache line of 64 bytes.
This depends heavily on the cpu microarchitecture that you are using.
In many cases, the memory address of an operator should be a multiple of the operand size, otherwise execution will be slow (or even might throw an exception).
But there are also CPUs which do not care about a specific alignment of the operands in memory at all.
Usually, the C compiler will care about those details for you. You should, however, make sure that the compiler assumes the correct target (micro-)architecture, for example by specifying it with the correct compiler flags (-march=? on gcc).

Why aren't structs packed by default?

While reading the CERT C Coding Standard, I came across DCL39-C which discusses why it's generally a bad idea for something like the Linux kernel to return an unpacked struct to userspace due to information leakage.
In a nutshell, structs aren't generally packed by default and the padding bytes between members of a struct often contain uninitialized data, hence the information leakage.
Why aren't structs packed by default? There was a mention in the guide that it's an optimization feature of compilers for specific architectures, I believe. Why is aligning structs to a certain byte size more efficient, as it wastes memory space?
Also, why doesn't the C standard specify a standardized way of asking for a packed struct? I can ask GCC using __attribute__((packed)), and there are other ways for different compilers, but it seems like a feature that'd be nice to have as part of the standard.
Data is carried though electronic circuits by groups of parallel wires (buses). Likewise, the circuits themselves tend to be arrayed in parallel. The physical distance between parallel components adds resistance and capacitance to any crosswise wires that bridge them. So, such bridges tend to be expensive and slow, and computer architects avoid them when possible.
Unaligned loads require shifting bytes across lanes. Some CPUs (e.g. efficiency-oriented RISC) are physically incapable of doing this, because the bridge component doesn't exist. Some will detect the condition and interpose a lane shift at the expense of a cycle or two. Others can handle misalignment without a speed penalty… assuming paged memory doesn't add another problem.
There's another, completely different issue. The memory management unit (MMU) sits between the CPU execution core and the memory bus, translating program-visible logical addresses to the physical addresses for memory chips. Two adjacent logical addresses might reside on different chips. The MMU is optimized for the common case where an access only requires one translation, but a misaligned access may require two.
A misaligned access straddling a page boundary might incur an exception, which might be fatal inside a kernel. Since pages are relatively large, this condition is relatively rare. It might evade tests, and it may be non-deterministic.
TL;DR: Packed structures shouldn't be used for active program state, especially not in a kernel. They may be used for serialization and communication, but safe usage is another question.
Leaving structs "unpacked" allows the compiler to align all members so that operations are more efficient on those members (measured in terms of clock time, number of instructions, etc). The alignment requirement for types depends on the host architecture and (for struct types) on the alignment requirement of contained members.
Packing struct members forces some (if not all) members to be aligned in a way that is sub-optimal for performance. In some worst cases - depending on host architecture - operations on unaligned variables (or on unaligned struct members) triggers a processor fault condition. RISC processor architectures, for example, generate an alignment fault when a load or store operation affects an unaligned address. Some SSE instructions on recent x86 architectures require data they act on to be aligned on 16 byte boundaries.
In best cases, the operations behave as intended, but less efficiently, due to overhead of copying an unaligned variable to an aligned location or to a register, doing the operation there, and copying it back. Those copying operations are less efficient when unaligned variables are involved - after all, the processor architecture is optimised for performance when variable alignment meets its design requirements.
If you are worried about data leaking out of your program, simply use functions like memset() to overwrite the contents of your structures at the end of their lifetime (e.g. just before an instance is about to pass out of scope, or immediately before dynamically allocated memory is deallocated using free()).
Or use an operating system (like OpenBSD) which does overwrite memory before making it available to processes or programs. Bear in mind that such features tend to make both the operating system and programs it hosts run less efficiently.
Recent versions of the C standard (since 2011) do have some facilities to query and control alignment of variables (and affect packing of struct members). The default is whatever alignment is most effective for the host architecture - which for struct types normally means unpacked.
On some compilers such as Microchip XC8, all structs are indeed always packed.
On some platforms compilers will only generate byte access instructions to access members of a packed struct, because byte access instructions are always aligned. If all structs are packed, the 16-, 32-, and 64- bit load/store instructions are not used. This is an obvious waste of resources.
The C standard does not specify a way of packing struct possibly because the standard itself is not aware of the concept of packing. Since the layout of non-bit-field members of structs is implementation defined, out of scope for the standard. Or possibly, the standard is made to support architectures that always add padding in structs, since such architectures are indeed theoretically feasible.

When do we need to use posix_memalign instead of malloc?

Seems posix_memalign let you choose a customized alignment,but when is that necessary?
malloc has already done the alignment work internally.
UPDATE
The exact reason I ask this is because I see nginx does this,ngx_memalign(NGX_POOL_ALIGNMENT, size, log);,here NGX_POOL_ALIGNMENT is defined as 16, nginxs.googlecode.com/svn-history/trunk/src/core/ngx_palloc.c
Basically, if you need tougher alignment than malloc will give you. Malloc generally returns a pointer aligned such, that it may be used with any of the primitive types (often, 8 bytes on common desktop machines).
However, sometimes you need memory aligned on other boundaries, for example 4K-aligned, etc. In this case, you would need memalign.
You would need this, for example,
when writing a memory manager (such as a garbage collector). In this case, it is sometimes handy to work with memory aligned on larger block sizes. This way, you can store meta-data common to all objects in a given block at the bottom of the allocated area, and access this simply by masking the least significant bits of the object pointer.
when interfacing with hardware (never done this myself, but IIRC, certain kinds of block-devices require aligned memory). See n.m.'s answer for details.
The only benefits of posix_memalign, as far as I can tell, are:
Allocating page-aligned (typically 4096 or larger alignment) memory for hardware-specific purposes.
Evil hacks where you keep the low N bits of a pointer zero so you can store an N-bit integer in the low bits. :-)
Various hardware may have alignment requirements which malloc cannot satisfy. The Linux man page gives one such example, I quote:
On many systems there are alignment
restrictions, e.g. on buffers used for
direct block device I/O. POSIX
specifies the
pathconf(path,_PC_REC_XFER_ALIGN) call
that tells what alignment is needed.
A couple of uses:
Some processors have instructions that will only work on data that is aligned on a power of two greater than or equal to the buffer size - for example bit reverse addressing instructions used in ffts (fast fourier transforms).
To align data to cache boundaries to optimize access in multiprocessing applications so that data in the same cache line isn't being accessed by two processors simultaneously.
Basically, if you don't need to do absurd levels of optimizations and/or your hardware doesn't demand that an array be on a particular boundary then you can forget about posix_memalign.

Why doesn't GCC optimize structs?

Systems demand that certain primitives be aligned to certain points within the memory (ints to bytes that are multiples of 4, shorts to bytes that are multiples of 2, etc.). Of course, these can be optimized to waste the least space in padding.
My question is why doesn't GCC do this automatically? Is the more obvious heuristic (order variables from biggest size requirement to smallest) lacking in some way? Is some code dependent on the physical ordering of its structs (is that a good idea)?
I'm only asking because GCC is super optimized in a lot of ways but not in this one, and I'm thinking there must be some relatively cool explanation (to which I am oblivious).
gcc does not reorder the elements of a struct, because that would violate the C standard. Section 6.7.2.1 of the C99 standard states:
Within a structure object, the non-bit-field members and the units in which bit-fields
reside have addresses that increase in the order in which they are declared.
Structs are frequently used as representations of the packing order of binary file formats and network protocols. This would break if that were done. In addition, different compilers would optimize things differently and linking code together from both would be impossible. This simply isn't feasible.
GCC is smarter than most of us in producing machine code from our source code; however, I shiver if it was smarter than us in re-arranging our structs, since it's data that e.g. can be written to a file. A struct that starts with 4 chars and then has a 4 byte integer would be useless if read on another system where GCC decided that it should re-arrange the struct members.
gcc SVN does have a structure reorganization optimization (-fipa-struct-reorg), but it requires whole-program analysis and isn't very powerful at the moment.
C compilers don't automatically pack structs precisely because of alignment issues like you mention. Accesses not on word boundaries (32-bit on most CPUs) carry heavy penalty on x86 and cause fatal traps on RISC architectures.
Not saying it's a good idea, but you can certainly write code that relies on the order of the members of a struct. For example, as a hack, often people cast a pointer to a struct as the type of a certain field inside that they want access to, then use pointer arithmetic to get there. To me this is a pretty dangerous idea, but I've seen it used, especially in C++ to force a variable that's been declared private to be publicly accessible when it's in a class from a 3rd party library and isn't publicly encapsulated. Reordering the members would totally break that.
You might want to try the latest gcc trunk or, struct-reorg-branch which is under active development.
https://gcc.gnu.org/wiki/cauldron2015?action=AttachFile&do=view&target=Olga+Golovanevsky_+Memory+Layout+Optimizations+of+Structures+and+Objects.pdf

Resources