What is "Alignment" field in binary formats? Why is it needed?

What is "Alignment" field in binary formats? Why is it needed? - linker

In ELF file format we have an Alignment field in Segment Header Table aka Program Header Table.
In case of Windows PE file format they take it to next level the Sections have two alignment values, one within the disk file and the other in memory. The PE file header specifies both of these values.
I didn't understand a thing about this alignment. What do we need it for? How & Where is it used? Again, I don't know what is alignment in binary file format context but why do we need it?

Well, alignment is usually stretching the storage size of some value to occupy some "round" space, like 32, 64, 128 bit etc.
If we're talking about binary formats, it may be done in order to optimize format processing. Read/write operations can be quicker when you read/write some "round" data length portions.
I found a reading for you, formulated in better words I can come up with right now:
Data structure alignment
Data structure alignment is the way data is arranged and accessed in computer memory. It consists of two separate but related issues: data alignment and data structure padding. When a modern computer reads from or writes to a memory address, it will do this in word sized chunks (e.g. 4 byte chunks on a 32-bit system). Data alignment means putting the data at a memory offset equal to some multiple of the word size, which increases the system's performance due to the way the CPU handles memory. To align the data, it may be necessary to insert some meaningless bytes between the end of the last data structure and the start of the next, which is data structure padding.

Related

what is aligned attribute and what are the uses of it

I have following lines in the code
# define __align_(x) __attribute__((aligned(x)))
I can use it int i __align_; what difference does it makes like like
I am using aligned attribute as above or if I am just creating my variable like int i; does it differ in how variable get created in memory?

I can use it int i __align_; what difference does it makes like like
This will not work because the macro is defined to have a parameter, __align_(x). When it is used without a parameter, it will not be replaced, and the compiler will report a syntax error. Also, identifiers starting with __ are reserved for the C implementation (for the use of the compiler, the standard library, and any other parts forming the C implementation), so a regular program should not use such a name.
When you use the macro correctly, it changes the normal alignment requirement for the type.
Generally, objects of various types have alignment requirements: They should be located in memory at addresses that are multiples of their requirement. The reasons for this are because computer hardware is usually designed to work with groups of bytes, so it may fetch data from memory in groups of, for example, four bytes: Bytes from 0 to 3, bytes from 4 to 7, bytes from 8 to 11, and so on.
If a four-byte object with four-byte alignment requirement is located at a multiple of four bytes, then it can be read from memory easily, by loading the group of bytes it is in. It can also be written to memory easily.
If the object were not at a multiple of four bytes, it cannot be loaded as one group of bytes. It can be loaded by loading the two groups of bytes it straddles, extracting the desired bytes, and combining the desired bytes in one processor register. However, that takes more work, so we want to avoid it. The compiler is written to automatically align things as desired for the C implementation, and it writes load and store instructions that expect the desired alignment.1
Different object types can have different alignment requirements even though they are bound by the same hardware behavior. For example, with a two-byte short, the alignment requirement may be two bytes. This is because, whether it starts at byte 0 or byte 2 within a group (say at address 100, 102, 104, or 106), we can load the short by loading a single group of four bytes and taking just the two bytes we want. However, if it started at byte 3 (say at address 103), we would have to load two groups of bytes (100 to 103 and 104 to 107) to get the bytes we needed for the short (103 and 104). So two-byte alignment suffices for this short even though the hardware is designed with four-byte groups.
As mentioned, the compiler handles alignment automatically. When you define a structure with multiple members of different types, the compiler inserts padding so that each member is aligned correctly, and it inserts padding at the end of the structure so that an array of them keeps the alignment from element to element in the array.
There are times when we want to override the compiler’s automatic behavior. When we are preparing to send data over a network connection, the communication protocol might require the different fields of a message to be packed together in consecutive bytes, with no padding. In this case, we can define a structure with an alignment requirement of 1 byte for it and all its members. When we are ready to send a message, we could copy data into this structure’s members and then write the structure to the network device.
When you tell the compiler an object is not aligned normally, the compiler will generate instructions for that. Instead of the normal load or store instructions, it will use special unaligned load or store instructions if the computer architecture has them. If it does not, the compiler will use instructions to shift and store individual bytes or to shift and merge bytes and store them as aligned words, depending on what instructions are available in the computer architecture. This is generally inefficient; it will slow down your program. So it should not be used in normal programming. Decreasing the alignment requirements should be used only when there is a need for controlling the layout of data in memory.
Sometimes increasing the alignment requirements is used for performance. For example, an array of four-byte float elements generally only needs four-byte alignment. However, some computers have special instructions to process four float elements (16 bytes) at a time, and the benefit from having that data aligned to a multiple of 16 bytes. (And some computers have instructions for even more data at one time.) In this case, we might increase the alignment requirement for our float array (but not its individual elements) so that it is aligned to be good with these instructions.
Footnote
1 What happens if you force an object to be located at an undesired alignment without telling the compiler varies. In some computers, when a load instruction is executed with an unaligned address, the processor will “trap,” meaning it stops normal program execution and transfers control to the operating system, reporting an error in your program. In some computers, the processor will ignore the low bits of the address and load the wrong data. In some computers, the processor will load the two groups of bytes, extract the desired bytes, and merge them. On computers that trap, the operating system might do the manual fix-up of loading the bytes, or it might terminate your program or report the error to your program.

The attribute tells the compiler that the variable in question must be placed in memory in addresses that are aligned to a certain number of bytes (addr % alignement == 0).
This is important because the CPU can only work on some integer values if they are aligned - such as int32 must be 4 bytes aligned and int64 must be 8 bytes aligned, pointers need to be 4/8 (32/64 bit cpu) aligned too.
The attribute is mostly used for structures, where certain fields within the structure must be memory aligned in order to allow the CPU to do integer operations on them (like mov.l) without hitting a BUS ERROR from the memory controller.
If structures aren't properly aligned, the compiler will have to add extra instructions to first load the unaligned value into a register with several memory operations which is more expensive in performance.
It can also be used to bump performance in more performance sensitive systems by creating buffers that are page aligned (4k usually) so that paging will have less of an impact, or if you want to create DMA-able buffer zones - but that's a bit more advanced...

Explanation of packed attribute in C

I was wondering if anyone could offer a more full explanation to the meaning of the packed attribute used in the bitmap example in pset4.
"Our use, incidentally, of the attribute called packed ensures that clang does not try to "word-align" members (whereby the address of each member’s first byte is a multiple of 4), lest we end up with "gaps" in our structs that don’t actually exist on disk."
I do not understand the comment around gaps in our structs. Does this refer to gaps in the memory location between each struct (i.e. one byte between each 3 byte RGB if it was to word-algin)? Why does this matter in for optimization?
typedef uint8_t BYTE;
typedef struct
{
BYTE rgbtBlue;
BYTE rgbtGreen;
BYTE rgbtRed;
} __attribute__((__packed__))
RGBTRIPLE;

Beware: prejudices on display!
As noted in comments, when the compiler adds the padding to a structure, it does so to improve performance. It uses the alignments for the structure elements that will give the best performance.
Not so very long ago, the DEC Alpha chips would handle a 'unaligned memory request' (umr) by doing a page fault, jumping into the kernel, fiddling with the bytes to get the required result, and returning the correct result. This was painfully slow by comparison with a correctly aligned memory request; you avoided such behaviour at all costs.
Other RISC chips (used to) give you a SIGBUS error if you do misaligned memory accesses. Even Intel chips have to do some fancy footwork to deal with misaligned memory accesses.
The purpose of removing padding is to (decrease performance but) benefit by being able to serialize and unserialize the data without doing the job 'properly' — it is a form of laziness that actually doesn't work properly when the machines communicating are not of the same type, so proper serialization should have been done in the first place.
What I mean is that if you are writing data over the network, it seems simpler to be able to send the data by writing the contents of a structure as a block of memory (error checking etc omitted):
write(fd, &structure, sizeof(structure));
The receiving end can read the data:
read(fd, &structure, sizeof(structure));
However, if the machines are of different types (for example, one has an Intel CPU and the other a SPARC or Power CPU), the interpretation of the data in those structures will vary between the two machines (unless every element of the array is either a char or an array of char). To relay the information reliably, you have to agree on a byte order (e.g. network byte order — this is very much a factor in TCP/IP networking, for example), and the data should be transmitted in the agreed upon order so that both ends can understand what the other is saying.
You can define other mechanisms: you could use a 'sender makes right' mechanism, in which the 'receiver' let's the sender know how it wants the data presented and the sender is responsible for fixing up the transmitted data. You can also use a 'receiver makes right' mechanism which works the other way around. Both these have been used commercially — see DRDA for one such protocol.
Given that the type of BYTE is uint8_t, there won't be any padding in the structure in any sane (commercially viable) compiler. IMO, the precaution is a fantasy or phobia without a basis in reality. I'd certainly need a carefully documented counter-example to believe that there's an actual problem that the attribute helps with.
I was led to believe that you could encounter issues when you pass the entire struct to a function like fread as it assumes you're giving it an array like chunk of memory, with no gaps in it. If your struct has gaps, the first byte ends up in the right place, but the next two bytes get written in the gap, which you don't have a proper way to access.
Sorta...but mostly no. The issue is that the values in the padding bytes are indeterminate. However, in the structure shown, there will be no padding in any compiler I've come across; the structure will be 3 bytes long. There is no reason to put any padding anywhere inside the structure (between elements) or after the last element (and the standard prohibits padding before the first element). So, in this context, there is no issue.
If you write binary data to a file and it has holes in it, then you get arbitrary byte values written where the holes are. If you read back on the same (type of) machine, there won't actually be a problem. If you read back on a different (type of) machine, there may be problems — hence my comments about serialization and deserialization. I've only been programming in C a little over 30 years; I've never needed packed, and don't expect to. (And yes, I've dealt with serialization and deserialization using a standard layout — the system I mainly worked on used big-endian data transfer, which corresponds to network byte order.)

Sometimes, the elements of a struct are simply aligned to a 4-byte boundary (or whatever the size of a register is in the CPU) to optimize read/write access to RAM. Often, smaller elements are packed together, but alignment is dictated by a larger type in the struct.
In your case, you probably don't need to pack the struct, but it doesn't hurt.
With some compilers, each byte in your struct could end up taking 4 bytes of RAM each (so, 12 bytes for the entire struct). Packing the struct removes the alignment requirement for each of the BYTEs, and ensures that the entire struct is placed into one 4-byte DWORD (unless the alignment for the entire program is set to one byte, or the struct is in an array of said structs, in that case it would literally be stored in 3 contiguous bytes of RAM).
See comments below for further discussion...

The objective is exactly what you said, not having gaps between each struct. Why is this important? Mostly because of cache. Memory access is slow!!! Cache is really fast. If you can fit more in cache you avoid cache misses (memory accesses).
Edit: Seems I was wrong, didn't seem really useful if the objective was structure padding since the struct has 3 BYTE

Byte Alignment in Files

I've been looking at file formats and information on byte alignment in files is hard to come by. I can find information on memory byte alignment ("Data Structure Alignment"), but that's a different matter.
In setting up a standard format, is there an optimal way to align bytes in a file that is good or even necessary for various systems? This is not for one data type, but for many. Is 2-byte alignment sufficient, or is it really even necessary? What about 4-byte alignment? How well will a 32-bit or 64-bit system handle this?

When working with binary data, very often you'll just write memory directly to the file. In that case, data in the file is aligned exactly as it is in memory. This has the advantage of not requiring any intermediate steps when reading the information back into your memory data structures. It does use a bit more disk space than absolutely required if you were to eliminate the alignment, but typically not a lot of space.
You have to be careful, though, if you'll be reading that data from other programs. They have to be written to take the padding bytes into account. For example if you have this structure:
struct foo
{
int a;
char b;
int c;
}
And you tell it to align on 32-bit boundaries, your memory (and therefore disk) layout will be:
4 bytes - a
1 byte - b
3 bytes - padding
4 bytes - c
If the other program isn't written to take that into account and instead assumes byte alignment, it'll try to read c from the four bytes immediately following b. The result, as you can imagine, wouldn't be good.
When I'm working with binary data, I usually just write the data to the file, ignoring the typically small amount of "waste" that's due to data alignment.

Why did Windows use the FAT structure instead of a conventional linked list with a next pointer for each data block of a file?

Instead of storing references to next nodes in a table, why couldn't it be just stored like a conventional linked list, that is, with a next pointer?

This is due to alignment. FAT (and just about any other file system) stores file data in one or more whole sectors of the underlying storage. Because the underlying storage can only read and write whole sectors such allocation allows efficient access to the contents of a file.
Issues with interleaving
When a program wants to store something in a file it provides a buffer, say 1MB of data to store. Now if the file's data sectors have to also keep next pointers to their next sector, this pointer information will need to be interleaved with the actual user data. So the file system would need to build another buffer (of slightly more than the provided 1MB), for each output sector copy some of the user data and the corresponding next pointer and give this new buffer to the storage. This would be somewhat inefficient. Unless the file system always stores file data to new sectors (and most usually don't), rewriting these next pointers will also be redundant.
The bigger problem would be when read operation is attempted on the file. Files will now work like tape devices: with only the location of the first sector known in the file's primary metadata, in order to reach sector 1000, the file system will need to read all sectors before it in order: read sector 0, find the address of sector 1 from the loaded next pointer, read sector 1, etc. With typical seek times of around 10 ms per random I/O (assuming a hard disk drive), reaching sector 1000 will take about 10 seconds. Even if sectors are sequentially ordered, while the file system driver processes sector N's data, the disk head will be flying over the next sector and when the read for sector N+1 is issued it may be too late, requiring the disk to rotate entire revolution (8.3ms for 7200 RPM drive) before being able to read the next sector again. On-disk cache can and will help with that though.
Writing single sector is usually atomic operation (depends on hardware): reading back the sector after power failure returns either its old content or the new one without intermediate states. Database applications usually need to know which writes would be atomic. If the file system interleaves file data and metadata in the same sectors, it will need to report smaller than the actual sector size to the application. For example instead of say 512 bytes it may need to report 504. But it can't do it because sector size is usually assumed by applications to be power of 2. Furthermore file stored on such filesystem would very likely be unusable if copied to another file system with different reported sector size.
Better approaches
The FAT format is better because all next pointers are stored in adjacent sectors. For FAT12, FAT16 and not very large FAT32 volumes the entire table is small enough to fit in memory. FAT still records the blocks of a file in a linked list, so to have efficient random access, an implementation needs to cache the chain per file. On large enough volumes (that can sport large enough file) such cache may no longer fit in memory.
ext3 uses direct and indirect blocks. This simple format avoids the need for preprocessing that FAT requires and goes by with only minimal amount of additional reads per I/O when indirect blocks are needed. These additional reads are cached by the operating system so that their overhead is often negligible.
Other variants are also possible and used by various file systems.
Random notes
For the sake of completeness, some hard disk drives can be formatted with slightly larger sector sizes (say 520 bytes) so that the file system can pack 512 bytes of file data with several bytes of metadata in the same sector. Yet because of the above, I don't believe anyone has used such formats for storing the address of the file's next sector. These additional bytes can be put to better use: additional checksums and timestamping come to mind. The timestamping I believe is used to improve the performance of some RAID systems. Still such usage is rare, and most software can't work with them at all.
Some file systems can save the content of small enough files in the file metadata directly without occupying distinct sectors. ReiserFS has the controversial tail packing. This is not important here: large files still benefit from having proper mapping to storage sectors.

Any modern OS requires much more than a pointer to the next data block for its file system: attributes (encryption, compression, hidden, ...), security descriptors (ACL list items), support for different hardware, buffering. This is just a tiny fraction of functionality that any good file system does.
Have a look at file system at Wikipedia to learn what else any modern file system does.

If we ignore the detail of FAT12 sharing a byte between two items to compact 12 bite as 1.5 bytes, then we can concentrate on the deeper meaning of the question.
It turns out that the FAT system is equivalent to a linked list with the following points:
The "next" pointer is located in an array (the FAT) instead of being appended or prepended to the actual data
The value written in "next" is an integer instead of the more familiar memory address of the next node.
The nodes are not reserved dynamically but represented by another array. That array is the entire data part of the hard drive.
One fascinating exercise we were assigned as part of the software engineer education was to convert an application using memory pointer to an equivalent application which use integer value. The rationale was that some processors (PDP-11? or another PDP-xx) would perform integer arithmetic much faster than memory pointer operation or maybe even did forbid any arithmetic on pointers.

Structure padding

I was trying to understand why structure padding is the reason structures cannot be compared by memcmp.
One small thing i dont understand about structure padding is this...
why should "a short be 2 byte aligned"or"a long be 4 byte aligned". I understand it is with their sizes but why can they not appear at any byte boundary?
Or in other words "why is 0x10004566 not a valid location for a long variable but 0x10004568 is?"

Because some platforms (i.e. CPUs) physically don't support "mis-aligned" memory accesses. Other platforms support them, but in a much slower fashion.
The padding you get in a struct is dependent on the choices your compiler makes, but it will be making those choices in order to satisfy the specific requirements of the CPU the code is targeted at.

Memory alignment is a very important issue when optimizing a program for speed. C, being a language that - generally - puts strong emphasis on speed, likes to enforce some rules which may make the program faster.
The limitation of aligned and unaligned memory accesses comes directly from the hardware used for fetching the data from the memory, which usually fetches it in chunks which are equal to the machine word in size. Say you want to access a doubleword (4 bytes) stored at location 101. This means that the memory controller would firstly have to (probably) issue a read of a doubleword at location 100, then another read of a doubleword at location 104, and then splice the individual bytes from locations 101, 102, 103, and 104 together. The whole operation takes (hypothetically) two clock cycles.
If you want to access a doubleword at location 100, there's no such issue, which should be illustrated clearly enough by the example I provided.
In fact, misaligned data access is such a big issue that SSE instructions (the "aligned" versions, there are also "misaligned" versions which don't do that) will cause a general protection fault if you try to access misaligned data with those.
As a rule of thumb, it never hurts to align 4-byte data on a 4-byte boundary, 8-byte data on a 8-byte boundary, and so forth.

The only additional example I can think of with respect to alignment is transfer of data, transfers of data (depending on the architecture) goes in blocks of say 32 bytes for example, if your data crosses a boundary it could require 2 transfers to receive the data, rather then 1.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight