I have a question about the following line of code:
char buffer[256] __attribute__((aligned(4096)));
The content of the global array "buffer" are strings, which i get from stdin.
I have read https://gcc.gnu.org/onlinedocs/gcc-4.4.1/gcc/Type-Attributes.html (gcc/gnu online documentation). I get that this attribute specifies a minimum alignment for variables in bytes.
My question regards the reason WHY i would need such an alignment for an char array?
Just because of perfomance reasons?
Perhaps the use of a constant is not the best idea, at least, without a good explanation about the target (it looks like the page size of the example is 4096). Some architectures have specific instructions for copying big chunks of memory (such as the whole page), which may do the process faster:
GCC also provides a target specific macro BIGGEST_ALIGNMENT, which is the largest
alignment ever used for any data type on the target machine you are compiling for. For
example, you could write:
short array[3] attribute ((aligned (BIGGEST_ALIGNMENT)));
The compiler automatically sets the alignment for the declared variable or field to
BIGGEST_ALIGNMENT. Doing this can often make copy operations more efficient,
because the compiler can use whatever instructions copy the biggest chunks of memory when performing copies to or from the variables or fields that you have aligned this way. Note that the value of BIGGEST_ALIGNMENT may change depending on command-line options.
[...]
Note that the effectiveness of aligned attributes may be limited by inherent limitations in your linker. On many systems, the linker is only able to arrange for variables to be aligned up to a certain maximum alignment. (For some linkers, the maximum supported alignment may be very very small.) If your linker is only able to align variables up to a maximum of 8-byte alignment, then specifying aligned(16) in an attribute still only provides you with 8-byte alignment. See your linker documentation for further information.
For the completeness sake, it wants to ensure that the array will be set in one page, in order to increase performance.
This link explains how to the aligned-page data (in that case, with aligned_malloc) improves the code: https://software.intel.com/en-us/articles/getting-the-most-from-opencl-12-how-to-increase-performance-by-minimizing-buffer-copies-on-intel-processor-graphics
I will guess that a specific HW wants it to be aligned that way. (mass storage reader, DMA etc).
I must say it is rare to see alignment that big.
Related
I have following lines in the code
# define __align_(x) __attribute__((aligned(x)))
I can use it int i __align_; what difference does it makes like like
I am using aligned attribute as above or if I am just creating my variable like int i; does it differ in how variable get created in memory?
I can use it int i __align_; what difference does it makes like like
This will not work because the macro is defined to have a parameter, __align_(x). When it is used without a parameter, it will not be replaced, and the compiler will report a syntax error. Also, identifiers starting with __ are reserved for the C implementation (for the use of the compiler, the standard library, and any other parts forming the C implementation), so a regular program should not use such a name.
When you use the macro correctly, it changes the normal alignment requirement for the type.
Generally, objects of various types have alignment requirements: They should be located in memory at addresses that are multiples of their requirement. The reasons for this are because computer hardware is usually designed to work with groups of bytes, so it may fetch data from memory in groups of, for example, four bytes: Bytes from 0 to 3, bytes from 4 to 7, bytes from 8 to 11, and so on.
If a four-byte object with four-byte alignment requirement is located at a multiple of four bytes, then it can be read from memory easily, by loading the group of bytes it is in. It can also be written to memory easily.
If the object were not at a multiple of four bytes, it cannot be loaded as one group of bytes. It can be loaded by loading the two groups of bytes it straddles, extracting the desired bytes, and combining the desired bytes in one processor register. However, that takes more work, so we want to avoid it. The compiler is written to automatically align things as desired for the C implementation, and it writes load and store instructions that expect the desired alignment.1
Different object types can have different alignment requirements even though they are bound by the same hardware behavior. For example, with a two-byte short, the alignment requirement may be two bytes. This is because, whether it starts at byte 0 or byte 2 within a group (say at address 100, 102, 104, or 106), we can load the short by loading a single group of four bytes and taking just the two bytes we want. However, if it started at byte 3 (say at address 103), we would have to load two groups of bytes (100 to 103 and 104 to 107) to get the bytes we needed for the short (103 and 104). So two-byte alignment suffices for this short even though the hardware is designed with four-byte groups.
As mentioned, the compiler handles alignment automatically. When you define a structure with multiple members of different types, the compiler inserts padding so that each member is aligned correctly, and it inserts padding at the end of the structure so that an array of them keeps the alignment from element to element in the array.
There are times when we want to override the compiler’s automatic behavior. When we are preparing to send data over a network connection, the communication protocol might require the different fields of a message to be packed together in consecutive bytes, with no padding. In this case, we can define a structure with an alignment requirement of 1 byte for it and all its members. When we are ready to send a message, we could copy data into this structure’s members and then write the structure to the network device.
When you tell the compiler an object is not aligned normally, the compiler will generate instructions for that. Instead of the normal load or store instructions, it will use special unaligned load or store instructions if the computer architecture has them. If it does not, the compiler will use instructions to shift and store individual bytes or to shift and merge bytes and store them as aligned words, depending on what instructions are available in the computer architecture. This is generally inefficient; it will slow down your program. So it should not be used in normal programming. Decreasing the alignment requirements should be used only when there is a need for controlling the layout of data in memory.
Sometimes increasing the alignment requirements is used for performance. For example, an array of four-byte float elements generally only needs four-byte alignment. However, some computers have special instructions to process four float elements (16 bytes) at a time, and the benefit from having that data aligned to a multiple of 16 bytes. (And some computers have instructions for even more data at one time.) In this case, we might increase the alignment requirement for our float array (but not its individual elements) so that it is aligned to be good with these instructions.
Footnote
1 What happens if you force an object to be located at an undesired alignment without telling the compiler varies. In some computers, when a load instruction is executed with an unaligned address, the processor will “trap,” meaning it stops normal program execution and transfers control to the operating system, reporting an error in your program. In some computers, the processor will ignore the low bits of the address and load the wrong data. In some computers, the processor will load the two groups of bytes, extract the desired bytes, and merge them. On computers that trap, the operating system might do the manual fix-up of loading the bytes, or it might terminate your program or report the error to your program.
The attribute tells the compiler that the variable in question must be placed in memory in addresses that are aligned to a certain number of bytes (addr % alignement == 0).
This is important because the CPU can only work on some integer values if they are aligned - such as int32 must be 4 bytes aligned and int64 must be 8 bytes aligned, pointers need to be 4/8 (32/64 bit cpu) aligned too.
The attribute is mostly used for structures, where certain fields within the structure must be memory aligned in order to allow the CPU to do integer operations on them (like mov.l) without hitting a BUS ERROR from the memory controller.
If structures aren't properly aligned, the compiler will have to add extra instructions to first load the unaligned value into a register with several memory operations which is more expensive in performance.
It can also be used to bump performance in more performance sensitive systems by creating buffers that are page aligned (4k usually) so that paging will have less of an impact, or if you want to create DMA-able buffer zones - but that's a bit more advanced...
How to get the memory granularity of a CPU in C?
Suppose I want to allocate an array where all the elements are properly memory aligned. I can pad each element to a certain size N to achieve this. How do I know the value of N?
Note: I am trying to create a memory pool where each slot is memory aligned. Any suggestion will be appreciated.
In Theory
How to get the memory granularity of a CPU in C?
First, you read the instruction set architecture manual. It may specify that certain instructions require certain alignments, or even that the addressing forms in certain instructions cannot represent non-aligned addresses. It may specify other properties regarding alignment.
Second, you read the processor manual. It may specify performance characteristics (such as that unaligned loads or stores are supported but may be slower or use more resources than aligned loads or stores) and may specify various options allowed by the instructions set architecture.
Third, you read the operating system documentation. Some architectures allow the operating system to select features related to alignment, such as whether unaligned loads and stores are made to fail or are supported albeit with slower performance than aligned loads or stores. The operating system documentation should have this information.
In Practice
For many programming situations, what you need to know is not the “memory granularity” of a CPU but the alignment requirements of the C implementation you are using (or of whatever language you are using). And, for the most part, you do not need to know the alignment requirements directly but just need to follow the language rules about managing objects—use objects with declared types, do not use casts to convert pointers between incompatible types exceed where specific rules allow it, use the suitably aligned memory as provided by malloc rather than adjusting your own pointers to bytes, and so on. Following these rules will give good alignment for the objects in your program.
In C, when you define an array, the element size will automatically be the size that C implementation needs for its alignment. For example, long double x[100]; may use 16 bytes for each array element even though the hardware uses only ten bytes for a long double. Or, for any struct foo that you define, the compiler will automatically include padding as needed in the structure to give the desired alignment, and any array struct foo x[100]; will already include that padding. sizeof(struct foo) will be the same as sizeof x[0], because each structure object has that padding built in, even just for a single structure object, not just for elements in arrays.
When you do need to know the alignment that a C implementation requires for a type, you can use C’s _Alignof operator. The expression _Alignof(type) provides the alignment required for type.
Other
… properly memory aligned.
Proper alignment is a matter of degrees:
What the processor supports may determine whether your program works or does not work. An improper alignment is one that causes your program to trap.
What is efficient with respect to individual loads and stores may affect how fast your program runs. An improper alignment is one that causes your program to execute more slowly.
In certain performance-critical situations, alignment with respect to cache and memory mapping features can also affect performance.
Short answer
Use 64 bytes.
Long answer
Data are loaded from and stored to memory in units called cache lines. If your program loads only part of the data in a cache line, then the whole line will be loaded into the CPU caches. Perhaps more importantly, the algorithm used for moving data between cores in a multi-core CPU operates on full cache lines; aligning your data to cache lines avoids false sharing, the situation where a cache line bounces between cores because it contains data manipulated by different threads.
It used to be the case that cache lines depended on the architecture, ranging from 16 up to 512 bytes. However, all current processors (Intel, AMD, ARM, MIPS) use a cache line of 64 bytes.
This depends heavily on the cpu microarchitecture that you are using.
In many cases, the memory address of an operator should be a multiple of the operand size, otherwise execution will be slow (or even might throw an exception).
But there are also CPUs which do not care about a specific alignment of the operands in memory at all.
Usually, the C compiler will care about those details for you. You should, however, make sure that the compiler assumes the correct target (micro-)architecture, for example by specifying it with the correct compiler flags (-march=? on gcc).
This statement in the article made me embarrassed:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment for the target. If you zero a structure and then set some of the fields, will the padding bits all be zero? According to the results of the survey, 36 percent were sure that they would be, and 29 percent didn't know. Depending on the compiler (and optimization level), it may or may not be.
It was not completely clear, so I turned to the standard. The ISO/IEC 9899 in §6.2.6.1 states:
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
Also in §6.7.2.1:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I just remembered that I recently implemented let's say some kind of hack, where I used the not-declared part of byte owned by bit-field. It was something like:
/* This struct is always allocated on the heap and is zeroed. */
struct some_struct {
/* initial part ... */
enum {
ONE,
TWO,
THREE,
FOUR,
} some_enum:8;
unsigned char flag:1;
unsigned char another_flag:1;
unsigned int size_of_smth;
/* ... remaining part */
};
The structure was not at my disposal therefore I couldn't change it, but I had an acute need to pass some information through it. So I calculated an address of corresponding byte like:
unsigned char *ptr = &some->size_of_smth - 1;
*ptr |= 0xC0; /* set flags */
Then later I checked flags the same way.
Also I should mention that the target compiler and platform were defined, so it's not a cross-platform thing. However, current questions are still take a place:
Can I rely on the fact that the padding bits of struct (in heap) will be still zeroed after memset/kzalloc/whatever and after some subsequent using? (This post does not disclose the topic in terms of the standard and safeguards for the further use of struct). And what about struct zeroed on stack like = {0}?
If yes, does it mean that I can safely use "unnamed"/"not declared" part of bit-field to transfer some info for my purposes everywhere (different platform, compiler, ..) in C? (If I know for sure that no one crazy is trying to store anything in this byte).
The short answer to your first question is "no".
While an appropriate call of memset(), such as memset(&some_struct_instance, 0, sizeof(some_struct)) will set all bytes in the structure to zero, that change is not required to be persistent after "some use" of some_struct_instance, such as setting any of the members within it.
So, for example, there is no guarantee that some_struct_instance.some_enum = THREE (i.e. storing a value into a member) will leave any padding bits in some_struct_instance unchanged. The only requirement in the standard is that values of other members of the structure are unaffected. However, the compiler may (in emitted object code or machine instructions) implement the assignment using some set of bitwise operations, and be allowed to take shortcuts in a way that doesn't leave the padding bits alone (e.g. by not emitting instructions that would otherwise ensure the padding bits are unaffected).
Even worse, a simple assignment like some_struct_instance = some_other_struct_instance (which, by definition, is the storing of a value into some_struct_instance) comes with no guarantees about the values of padding bits. It is not guaranteed that the padding bits in some_struct_instance will be set to the same bitwise values as padding bits in some_other_struct_instance, nor is there a guarantee that the padding bits in some_struct_instance will be unchanged. This is because the compiler is allowed to implement the assignment in whatever means it deems most "efficient" (e.g. copying memory verbatim, some set of member-wise assignments, or whatever) but - since the value of padding bits after the assignment are unspecified - is not required to ensure the padding bits are unchanged.
If you get lucky, and fiddling with the padding bits works for your purpose, it will not be because of any support in the C standard. It will be because of good graces of the compiler vendor (e.g. choosing to emit a set of machine instructions that ensure padding bits are not changed). And, practically, there is no guarantee that the compiler vendor will keep doing things the same way - for example, your code that relies on such a thing may break when the compiler is updated, when you choose different optimisation settings, or whatever.
Since the answer to your first question is "no", there is no need to answer your second question. However, philosophically, if you are trying to store data in padding bits of a structure, it is reasonable to assert that someone else - crazy or not - may potentially attempt to do the same thing, but using an approach that messes up the data you are attempting to pass around.
From the first words of the standard specification:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment ...
These words mean that, in the aim to optimize (optimize for speed, probably, but also to avoid architecture restrictions on data/address buses), the compiler can make use of hidden, not-used, bits or bytes. NOT-USED because they would be forbidden or costly to address.
This also imply that those bytes or bits should not be visible from a programming perspective, and it should be considered a programming error to try to access those hidden data.
About those added data, the standard says that their content is "unspecified", and there is really no better way to state what an implementation can do with them. Think at those bitfield declarations, where you can declare integers with any bit width: no normal hardware will permit to read/write from memory in chunks smaller that 8 bits, so the CPU will always read or write at least 8 bits (sometimes, even more). Why should a compiler (an implementation) take care of doing something useful to those other bits, which the programmer specified he does not care about? It's a non sense: the programmer didn't give a name to some memory address, but then he wants to manipulate it?
The padding bytes between fields is pretty much the same matter as before: those added bytes are necessary, but the programmer is not interested in them - and he SHOULD NOT change its mind later!
Of course, one can study an implementation and arrive at some conclusion like "padding bytes will always be zeroed" or something like that. This is risky (are you sure they will be always-always zeroed?) but, more important, it is totally useless: if you need more data in a structure, simply declare them! And you will have no problem, never, even porting the source to different platforms or implementations.
It is reasonable to start with the expectation that what is listed in the standard is correctly implemented. You're looking for further assurances for a particular architecture. Personally, if I could find documented details about that particular architecture, I would be reassured; if not, I would be cautious.
What constituted "cautious" would depend on how confident I needed to be. For example, building a detailed test set and running this periodically on my target architecture would give me a reasonable degree of confidence, but it's all about how much risk you want to take. If it's really, really important, stick to what they standards guarantee you; if it's less so, test and see if you can get enough confidence for what you need.
I've been working with a program and I've been trying to conserve bytes and storage space.
I have many variables in my C program, but I wondered if I could reduce the program's size by making some of the variables that don't change throughout the program const or final.
So my questions are these:
Is there any byte save when identifying static variables as constant?
If bytes are saved by doing this, why are they saved? How does the program store the variable differently if it is constant, and why does this way need less storage space?
If bytes are not saved by defining variables as constant, then why would a developer define a variable this way in the first place? Could we not just leave out the const just in case we need to change the variable later (especially if there is no downfall in doing so)?
Are there only some IDEs/Languages that save bytes with constant variables?
Thanks for any help, it is greatly appreciated.
I presume you're working on deeply embedded system (like cortex-M processors).
For these, you know that SRAM is a scarce resource whereas you have plenty of FLASH memory.
Then as much as you can, use the const keyword for any variable that doesn't change. Doing this will tell compiler to store the variable in FLASH memory and not in SRAM.
For example, to store a text on your system you can do this:
const char* const txtMenuRoot[] = { "Hello, this is the root menu", "Another text" };
Then not only the text is stored in FLASH, but also its pointer.
All your questions depend heavily on compiler and environment. A C compiler intended for embedded environment can do a great job about saving memory, while others maybe not.
Is there any byte save when identifying static variables as constant?
Yes, it may be possible. But note that "const", generally, isn't intended to specify how to store a variable - instead its meaning is to help the programmer and the compiler to better understand the source code (when the compiler "understand better", it can produce better object code). Some compiler can use that information to also store the variable in read-only memory, or delete it and turn it into literals in object code. But in the context of your question, may be that a #define is more suitable.
If bytes are saved by doing this, why are they saved? How does the program store the variable differently if it is constant, and why does this way need less storage space?
Variables declared in source code can go to different places in the object code, and different places when an object file is loaded in memory and executed. Note that, again, there are differences on various architectures - for example in a small 8/16 bits MCU (cpu for electronic devices), generally there is no "loading" of an object file. So the value of a variable is stored somewhere - anyway. But at low level the compiler can use literals instead of addresses, and this mostly saves some memory. Suppose you declare a constant variable GAIN=5 in source code. When that variable is used in some formula, the compiler emits something like "LD R12,GAIN" (loads register R12 with the content of the address GAIN, where variable GAIN is stored). But the compiler can also emit "LD R12,#5" (loads the value "5" in R12). In both cases an instruction is needed, but in the second case there is no memory for variables involved. This is a saving, and can also be faster.
If bytes are not saved by defining variables as constant, then why would a developer define a variable this way in the first place? Could we not just leave out the const just in case we need to change the variable later (especially if there is no downfall in doing so)?
As told earlier, the "const" keyword is meant to better define what operations will be done on the variable. This is useful for programmers, for clarity. It is useful to clearly state that a variable is not intended to be modified, especially when the variable is a formal parameter. In some environments, there is actually some read-only memory that can only be read and not written to and, if a variable (maybe a "system variable") is marked as "const", all is clear to the programmer -and- the compiler, which can warn if it encounters code trying to modify that variable.
Are there only some IDEs/Languages that save bytes with constant variables?
Definitely yes. But don't talk about IDEs: they are only editors. And about languages, things are complicated: it depends entirely on implementation and optimization. Likely this kind of saving is used only in compilers (not interpreters), and depends a lot on optimization options/capabilities of the compiler.
Think of const this way (there is no such thing as final or constant in C, so I'll just ignore that). If it's possible for the compiler to save memory, it will (especially when you compile optimizing for size). const gives the compiler more information about the properties of an object. The compiler can make smarter decisions when it has more information and it doesn't prevent the compiler from making the exact same decision as before it had that information.
It can't hurt and may help and it also helps the programmers working with the code to easier reason about it. Both the compiler and the programmer are helped, no one gets hurt. It's a win-win.
Compilers can reduce the memory used based on the knowledge of the code, const help compiler to know the real code behaviour (if you activate warnings you can have suggestions of where to put const).
But a struct can contains unused byte due to alignment restrictions of the hw used and compilers cannot alter the inner order of a struct. This can be done only changing the code.
struct wide struct compact
{ {
int_least32_t i1; int_least32_t i1,
int_least8_t b; i2;
int_least32_t i2; int_least8_t b;
} }
Due to the alignment restrictions the struct wide can have an empty space between members 'b' and 'i2'.
This is not the case in struct compact because the elements are listed from the widest, which can require greater alignments, to the smaller.
In same cases the struct compact leads even to faster code.
Seems posix_memalign let you choose a customized alignment,but when is that necessary?
malloc has already done the alignment work internally.
UPDATE
The exact reason I ask this is because I see nginx does this,ngx_memalign(NGX_POOL_ALIGNMENT, size, log);,here NGX_POOL_ALIGNMENT is defined as 16, nginxs.googlecode.com/svn-history/trunk/src/core/ngx_palloc.c
Basically, if you need tougher alignment than malloc will give you. Malloc generally returns a pointer aligned such, that it may be used with any of the primitive types (often, 8 bytes on common desktop machines).
However, sometimes you need memory aligned on other boundaries, for example 4K-aligned, etc. In this case, you would need memalign.
You would need this, for example,
when writing a memory manager (such as a garbage collector). In this case, it is sometimes handy to work with memory aligned on larger block sizes. This way, you can store meta-data common to all objects in a given block at the bottom of the allocated area, and access this simply by masking the least significant bits of the object pointer.
when interfacing with hardware (never done this myself, but IIRC, certain kinds of block-devices require aligned memory). See n.m.'s answer for details.
The only benefits of posix_memalign, as far as I can tell, are:
Allocating page-aligned (typically 4096 or larger alignment) memory for hardware-specific purposes.
Evil hacks where you keep the low N bits of a pointer zero so you can store an N-bit integer in the low bits. :-)
Various hardware may have alignment requirements which malloc cannot satisfy. The Linux man page gives one such example, I quote:
On many systems there are alignment
restrictions, e.g. on buffers used for
direct block device I/O. POSIX
specifies the
pathconf(path,_PC_REC_XFER_ALIGN) call
that tells what alignment is needed.
A couple of uses:
Some processors have instructions that will only work on data that is aligned on a power of two greater than or equal to the buffer size - for example bit reverse addressing instructions used in ffts (fast fourier transforms).
To align data to cache boundaries to optimize access in multiprocessing applications so that data in the same cache line isn't being accessed by two processors simultaneously.
Basically, if you don't need to do absurd levels of optimizations and/or your hardware doesn't demand that an array be on a particular boundary then you can forget about posix_memalign.