Explanation of packed attribute in C - c

I was wondering if anyone could offer a more full explanation to the meaning of the packed attribute used in the bitmap example in pset4.
"Our use, incidentally, of the attribute called packed ensures that clang does not try to "word-align" members (whereby the address of each member’s first byte is a multiple of 4), lest we end up with "gaps" in our structs that don’t actually exist on disk."
I do not understand the comment around gaps in our structs. Does this refer to gaps in the memory location between each struct (i.e. one byte between each 3 byte RGB if it was to word-algin)? Why does this matter in for optimization?
typedef uint8_t BYTE;
typedef struct
{
BYTE rgbtBlue;
BYTE rgbtGreen;
BYTE rgbtRed;
} __attribute__((__packed__))
RGBTRIPLE;

Beware: prejudices on display!
As noted in comments, when the compiler adds the padding to a structure, it does so to improve performance. It uses the alignments for the structure elements that will give the best performance.
Not so very long ago, the DEC Alpha chips would handle a 'unaligned memory request' (umr) by doing a page fault, jumping into the kernel, fiddling with the bytes to get the required result, and returning the correct result. This was painfully slow by comparison with a correctly aligned memory request; you avoided such behaviour at all costs.
Other RISC chips (used to) give you a SIGBUS error if you do misaligned memory accesses. Even Intel chips have to do some fancy footwork to deal with misaligned memory accesses.
The purpose of removing padding is to (decrease performance but) benefit by being able to serialize and unserialize the data without doing the job 'properly' — it is a form of laziness that actually doesn't work properly when the machines communicating are not of the same type, so proper serialization should have been done in the first place.
What I mean is that if you are writing data over the network, it seems simpler to be able to send the data by writing the contents of a structure as a block of memory (error checking etc omitted):
write(fd, &structure, sizeof(structure));
The receiving end can read the data:
read(fd, &structure, sizeof(structure));
However, if the machines are of different types (for example, one has an Intel CPU and the other a SPARC or Power CPU), the interpretation of the data in those structures will vary between the two machines (unless every element of the array is either a char or an array of char). To relay the information reliably, you have to agree on a byte order (e.g. network byte order — this is very much a factor in TCP/IP networking, for example), and the data should be transmitted in the agreed upon order so that both ends can understand what the other is saying.
You can define other mechanisms: you could use a 'sender makes right' mechanism, in which the 'receiver' let's the sender know how it wants the data presented and the sender is responsible for fixing up the transmitted data. You can also use a 'receiver makes right' mechanism which works the other way around. Both these have been used commercially — see DRDA for one such protocol.
Given that the type of BYTE is uint8_t, there won't be any padding in the structure in any sane (commercially viable) compiler. IMO, the precaution is a fantasy or phobia without a basis in reality. I'd certainly need a carefully documented counter-example to believe that there's an actual problem that the attribute helps with.
I was led to believe that you could encounter issues when you pass the entire struct to a function like fread as it assumes you're giving it an array like chunk of memory, with no gaps in it. If your struct has gaps, the first byte ends up in the right place, but the next two bytes get written in the gap, which you don't have a proper way to access.
Sorta...but mostly no. The issue is that the values in the padding bytes are indeterminate. However, in the structure shown, there will be no padding in any compiler I've come across; the structure will be 3 bytes long. There is no reason to put any padding anywhere inside the structure (between elements) or after the last element (and the standard prohibits padding before the first element). So, in this context, there is no issue.
If you write binary data to a file and it has holes in it, then you get arbitrary byte values written where the holes are. If you read back on the same (type of) machine, there won't actually be a problem. If you read back on a different (type of) machine, there may be problems — hence my comments about serialization and deserialization. I've only been programming in C a little over 30 years; I've never needed packed, and don't expect to. (And yes, I've dealt with serialization and deserialization using a standard layout — the system I mainly worked on used big-endian data transfer, which corresponds to network byte order.)

Sometimes, the elements of a struct are simply aligned to a 4-byte boundary (or whatever the size of a register is in the CPU) to optimize read/write access to RAM. Often, smaller elements are packed together, but alignment is dictated by a larger type in the struct.
In your case, you probably don't need to pack the struct, but it doesn't hurt.
With some compilers, each byte in your struct could end up taking 4 bytes of RAM each (so, 12 bytes for the entire struct). Packing the struct removes the alignment requirement for each of the BYTEs, and ensures that the entire struct is placed into one 4-byte DWORD (unless the alignment for the entire program is set to one byte, or the struct is in an array of said structs, in that case it would literally be stored in 3 contiguous bytes of RAM).
See comments below for further discussion...

The objective is exactly what you said, not having gaps between each struct. Why is this important? Mostly because of cache. Memory access is slow!!! Cache is really fast. If you can fit more in cache you avoid cache misses (memory accesses).
Edit: Seems I was wrong, didn't seem really useful if the objective was structure padding since the struct has 3 BYTE

Related

what is aligned attribute and what are the uses of it

I have following lines in the code
# define __align_(x) __attribute__((aligned(x)))
I can use it int i __align_; what difference does it makes like like
I am using aligned attribute as above or if I am just creating my variable like int i; does it differ in how variable get created in memory?
I can use it int i __align_; what difference does it makes like like
This will not work because the macro is defined to have a parameter, __align_(x). When it is used without a parameter, it will not be replaced, and the compiler will report a syntax error. Also, identifiers starting with __ are reserved for the C implementation (for the use of the compiler, the standard library, and any other parts forming the C implementation), so a regular program should not use such a name.
When you use the macro correctly, it changes the normal alignment requirement for the type.
Generally, objects of various types have alignment requirements: They should be located in memory at addresses that are multiples of their requirement. The reasons for this are because computer hardware is usually designed to work with groups of bytes, so it may fetch data from memory in groups of, for example, four bytes: Bytes from 0 to 3, bytes from 4 to 7, bytes from 8 to 11, and so on.
If a four-byte object with four-byte alignment requirement is located at a multiple of four bytes, then it can be read from memory easily, by loading the group of bytes it is in. It can also be written to memory easily.
If the object were not at a multiple of four bytes, it cannot be loaded as one group of bytes. It can be loaded by loading the two groups of bytes it straddles, extracting the desired bytes, and combining the desired bytes in one processor register. However, that takes more work, so we want to avoid it. The compiler is written to automatically align things as desired for the C implementation, and it writes load and store instructions that expect the desired alignment.1
Different object types can have different alignment requirements even though they are bound by the same hardware behavior. For example, with a two-byte short, the alignment requirement may be two bytes. This is because, whether it starts at byte 0 or byte 2 within a group (say at address 100, 102, 104, or 106), we can load the short by loading a single group of four bytes and taking just the two bytes we want. However, if it started at byte 3 (say at address 103), we would have to load two groups of bytes (100 to 103 and 104 to 107) to get the bytes we needed for the short (103 and 104). So two-byte alignment suffices for this short even though the hardware is designed with four-byte groups.
As mentioned, the compiler handles alignment automatically. When you define a structure with multiple members of different types, the compiler inserts padding so that each member is aligned correctly, and it inserts padding at the end of the structure so that an array of them keeps the alignment from element to element in the array.
There are times when we want to override the compiler’s automatic behavior. When we are preparing to send data over a network connection, the communication protocol might require the different fields of a message to be packed together in consecutive bytes, with no padding. In this case, we can define a structure with an alignment requirement of 1 byte for it and all its members. When we are ready to send a message, we could copy data into this structure’s members and then write the structure to the network device.
When you tell the compiler an object is not aligned normally, the compiler will generate instructions for that. Instead of the normal load or store instructions, it will use special unaligned load or store instructions if the computer architecture has them. If it does not, the compiler will use instructions to shift and store individual bytes or to shift and merge bytes and store them as aligned words, depending on what instructions are available in the computer architecture. This is generally inefficient; it will slow down your program. So it should not be used in normal programming. Decreasing the alignment requirements should be used only when there is a need for controlling the layout of data in memory.
Sometimes increasing the alignment requirements is used for performance. For example, an array of four-byte float elements generally only needs four-byte alignment. However, some computers have special instructions to process four float elements (16 bytes) at a time, and the benefit from having that data aligned to a multiple of 16 bytes. (And some computers have instructions for even more data at one time.) In this case, we might increase the alignment requirement for our float array (but not its individual elements) so that it is aligned to be good with these instructions.
Footnote
1 What happens if you force an object to be located at an undesired alignment without telling the compiler varies. In some computers, when a load instruction is executed with an unaligned address, the processor will “trap,” meaning it stops normal program execution and transfers control to the operating system, reporting an error in your program. In some computers, the processor will ignore the low bits of the address and load the wrong data. In some computers, the processor will load the two groups of bytes, extract the desired bytes, and merge them. On computers that trap, the operating system might do the manual fix-up of loading the bytes, or it might terminate your program or report the error to your program.
The attribute tells the compiler that the variable in question must be placed in memory in addresses that are aligned to a certain number of bytes (addr % alignement == 0).
This is important because the CPU can only work on some integer values if they are aligned - such as int32 must be 4 bytes aligned and int64 must be 8 bytes aligned, pointers need to be 4/8 (32/64 bit cpu) aligned too.
The attribute is mostly used for structures, where certain fields within the structure must be memory aligned in order to allow the CPU to do integer operations on them (like mov.l) without hitting a BUS ERROR from the memory controller.
If structures aren't properly aligned, the compiler will have to add extra instructions to first load the unaligned value into a register with several memory operations which is more expensive in performance.
It can also be used to bump performance in more performance sensitive systems by creating buffers that are page aligned (4k usually) so that paging will have less of an impact, or if you want to create DMA-able buffer zones - but that's a bit more advanced...

Is it better for performance to have alignment padding at the end of a small struct instead of between 2 members?

We know that there is padding in some structures in C. Please consider the following 2:
struct node1 {
int a;
int b;
char c;
};
struct node2 {
int a;
char c;
int b;
};
Assuming sizeof(int) = alignof(int) = 4 bytes:
sizeof(node1) = sizeof(node2) = 12, due to padding.
What is the performance difference between the two? (if any, w.r.t. the compiler or the architecture of the system, especially with GCC)
These are bad examples - in this case it doesn't matter, since the amount of padding will be the same in either case. There will not be any performance differences.
The compiler will always strive to fill up trailing padding at the end of a struct or otherwise using arrays of structs wouldn't be feasible, since the first member should always be aligned. If not for trailing padding in some item struct_array[0], then the first member in struct_array[1] would end up misaligned.
The order would matter if we were to do this though:
struct node3 {
int a;
char b;
int c;
char d;
};
Assuming 4 byte int and 4 byte alignment, then b occupies 1+3 bytes here, and d an additional 1+3 bytes. This could have been written better if the two char members were placed adjacently, in which case the total amount of padding would just have been 2 bytes.
I would not be surprised if the interviewer's opinion was based on the old argument of backward compatibility when extending the struct in the future. Additional fields (char, smallint) may benefit from the space occupied by the trailing padding, without the risk of affecting the memory offset of the existing fields.
In most cases, it's a moot point. The approach itself is likely to break compatibility, for two reasons:
Starting the extensions on a new alignment boundary (as would happen to node2) may not be memory-optimal, but it might well prevent the new fields from accidentally being overwritten by the padding of a 'legacy' struct.
When compatibility is that much of an issue (e.g. when persisting or transferring data), then it makes more sense to serialize/deserialize (even if binary is a requirement) than to depend on a binary format that varies per architecture, per compiler, even per compiler option.
OK, I might be completely off the mark here since this is a bit out of my league. If so, please correct me. But this is how I see it:
First of all, why do we need padding and alignment at all? It's just wasted bytes, isn't it? Well, turns out that processors like it. That is, if you issue an instruction to the CPU that operates on a 32-bit integer, the CPU will demand that this integer resides at a memory address which is dividable by 4. For a 64-bit integer it will need to reside in an address dividable by 8. And so on. This is done to make the CPU design simpler and better performant.
If you violate this requirement (aka "unaligned memory access"), most CPUs will raise an exception. x86 is actually an oddity because it will still perform the operation - but it will take more than twice as long because it will fetch the value from memory in two passes rather than one and then do bitwise magic to stick the value together from these separate accesses.
So this is the reason why compilers add padding to structs - so that all the members would be properly aligned and the CPU could access them quickly (or at all). Well, that's assuming the struct itself is located at a proper memory address. But it will also take care of that as long as you stick to standard operations for allocating the memory.
But it is possible to explicitly tell the compiler that you want a different alignment too. For example, if you want to use your struct to read in a bunch of data from a tightly packed file, you could explicitly set the padding to 1. In that case the compiler will also have to emit extra instructions to compensate for potential misalignment.
TL;DR - wrong alignment makes everything slower (or under certain conditions can crash your program entirely).
However this doesn't answer the question "where to better put the padding?" Padding is needed, yes, but where? Well, it doesn't make much difference directly, however by rearranging your members carefully you can reduce the size of the entire struct. And less memory used usually means a faster program. Especially if you create large arrays of these structs, using less memory will mean less memory accesses and more efficient use of CPU cache.
In your example however I don't think there's any difference.
P.S. Why does your struct end with a padding? Because arrays. The compiler wants to make sure that if you allocate an array of these structs, they will all be properly aligned. Because array members don't have any padding between them.
What is the performance difference between the two?
The performance difference is "indeterminable". For most cases it won't make any difference.
For cases where it does make a difference; either version might be faster, depending on how the structure is used. For one example, if you have a large array of these structures and frequently select a structure in the array "randomly"; then if you only access a and b of the randomly selected structure the first version can be faster (because a and b are more likely to be in the same cache line), and if you only access a and c then the second version can be faster.

Is it guaranteed that the padding bits of "zeroed" structure will be zeroed in C?

This statement in the article made me embarrassed:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment for the target. If you zero a structure and then set some of the fields, will the padding bits all be zero? According to the results of the survey, 36 percent were sure that they would be, and 29 percent didn't know. Depending on the compiler (and optimization level), it may or may not be.
It was not completely clear, so I turned to the standard. The ISO/IEC 9899 in §6.2.6.1 states:
When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values.
Also in §6.7.2.1:
The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.
I just remembered that I recently implemented let's say some kind of hack, where I used the not-declared part of byte owned by bit-field. It was something like:
/* This struct is always allocated on the heap and is zeroed. */
struct some_struct {
/* initial part ... */
enum {
ONE,
TWO,
THREE,
FOUR,
} some_enum:8;
unsigned char flag:1;
unsigned char another_flag:1;
unsigned int size_of_smth;
/* ... remaining part */
};
The structure was not at my disposal therefore I couldn't change it, but I had an acute need to pass some information through it. So I calculated an address of corresponding byte like:
unsigned char *ptr = &some->size_of_smth - 1;
*ptr |= 0xC0; /* set flags */
Then later I checked flags the same way.
Also I should mention that the target compiler and platform were defined, so it's not a cross-platform thing. However, current questions are still take a place:
Can I rely on the fact that the padding bits of struct (in heap) will be still zeroed after memset/kzalloc/whatever and after some subsequent using? (This post does not disclose the topic in terms of the standard and safeguards for the further use of struct). And what about struct zeroed on stack like = {0}?
If yes, does it mean that I can safely use "unnamed"/"not declared" part of bit-field to transfer some info for my purposes everywhere (different platform, compiler, ..) in C? (If I know for sure that no one crazy is trying to store anything in this byte).
The short answer to your first question is "no".
While an appropriate call of memset(), such as memset(&some_struct_instance, 0, sizeof(some_struct)) will set all bytes in the structure to zero, that change is not required to be persistent after "some use" of some_struct_instance, such as setting any of the members within it.
So, for example, there is no guarantee that some_struct_instance.some_enum = THREE (i.e. storing a value into a member) will leave any padding bits in some_struct_instance unchanged. The only requirement in the standard is that values of other members of the structure are unaffected. However, the compiler may (in emitted object code or machine instructions) implement the assignment using some set of bitwise operations, and be allowed to take shortcuts in a way that doesn't leave the padding bits alone (e.g. by not emitting instructions that would otherwise ensure the padding bits are unaffected).
Even worse, a simple assignment like some_struct_instance = some_other_struct_instance (which, by definition, is the storing of a value into some_struct_instance) comes with no guarantees about the values of padding bits. It is not guaranteed that the padding bits in some_struct_instance will be set to the same bitwise values as padding bits in some_other_struct_instance, nor is there a guarantee that the padding bits in some_struct_instance will be unchanged. This is because the compiler is allowed to implement the assignment in whatever means it deems most "efficient" (e.g. copying memory verbatim, some set of member-wise assignments, or whatever) but - since the value of padding bits after the assignment are unspecified - is not required to ensure the padding bits are unchanged.
If you get lucky, and fiddling with the padding bits works for your purpose, it will not be because of any support in the C standard. It will be because of good graces of the compiler vendor (e.g. choosing to emit a set of machine instructions that ensure padding bits are not changed). And, practically, there is no guarantee that the compiler vendor will keep doing things the same way - for example, your code that relies on such a thing may break when the compiler is updated, when you choose different optimisation settings, or whatever.
Since the answer to your first question is "no", there is no need to answer your second question. However, philosophically, if you are trying to store data in padding bits of a structure, it is reasonable to assert that someone else - crazy or not - may potentially attempt to do the same thing, but using an approach that messes up the data you are attempting to pass around.
From the first words of the standard specification:
C permits an implementation to insert padding into structures (but not into arrays) to ensure that all fields have a useful alignment ...
These words mean that, in the aim to optimize (optimize for speed, probably, but also to avoid architecture restrictions on data/address buses), the compiler can make use of hidden, not-used, bits or bytes. NOT-USED because they would be forbidden or costly to address.
This also imply that those bytes or bits should not be visible from a programming perspective, and it should be considered a programming error to try to access those hidden data.
About those added data, the standard says that their content is "unspecified", and there is really no better way to state what an implementation can do with them. Think at those bitfield declarations, where you can declare integers with any bit width: no normal hardware will permit to read/write from memory in chunks smaller that 8 bits, so the CPU will always read or write at least 8 bits (sometimes, even more). Why should a compiler (an implementation) take care of doing something useful to those other bits, which the programmer specified he does not care about? It's a non sense: the programmer didn't give a name to some memory address, but then he wants to manipulate it?
The padding bytes between fields is pretty much the same matter as before: those added bytes are necessary, but the programmer is not interested in them - and he SHOULD NOT change its mind later!
Of course, one can study an implementation and arrive at some conclusion like "padding bytes will always be zeroed" or something like that. This is risky (are you sure they will be always-always zeroed?) but, more important, it is totally useless: if you need more data in a structure, simply declare them! And you will have no problem, never, even porting the source to different platforms or implementations.
It is reasonable to start with the expectation that what is listed in the standard is correctly implemented. You're looking for further assurances for a particular architecture. Personally, if I could find documented details about that particular architecture, I would be reassured; if not, I would be cautious.
What constituted "cautious" would depend on how confident I needed to be. For example, building a detailed test set and running this periodically on my target architecture would give me a reasonable degree of confidence, but it's all about how much risk you want to take. If it's really, really important, stick to what they standards guarantee you; if it's less so, test and see if you can get enough confidence for what you need.

Strange pointer behaviour

I have a buffer containing data which was previously read from socket.
From the information stored in the buffer,
I can tell the entire data length (91 Bytes) and from specification,
I know the positions of other information I need to retrieve (32bit integer and 16bit integer, lets call them uid and suid).
unsigned char buffer[1024];
uint32_t uid;
uint16_t suid;
uid = ntohl ( *((uint32_t*) (buffer + sizeof (struct pktheader))) );
suid = ntohs ( *((uint16_t*) (buffer + sizeof (struct pktheader) + sizeof (uint32_t))) );
This code was cross-compiled for ARM and for some unexpected reasons content of uid was filled with incorrect bytes which were part of the buffer but were residing before (!) the content I meant to retrieve.
As if the offset was calculated incorrectly.
Strangely enough content of suid was just fine.
Can you explain how is this behaviour even possible? I know it may be difficult from information I provided... We can rule out incorrect value of sizeof (struct pktheader), I have double checked. The content in the buffer is as defined in specification. I even found a working solution using memcpy using same offset calculation to get each part out, so we can pretty much rule out a possibility of mingled data.
I discussed it with my colleague and his professional guess was that some auto-alignment behaviors happened pointing out that offset was just 2 Bytes off. However I would like to know more.
Until now I was quite fond of this construction in order to access individual parts stored in buffers.
This is almost certainly an alignment issue as #Dark Falcon suggests. Most likely, the CPU ignored the bottom two bits of the address and performed the load aligned.
Whilst not supporting unaligned loads might seem like an odd design choice, it's mostly to do with gate count and power consumption. There are two nasty scenarios that the CPU would need to cope with otherwise:
The load/store straddles a cache-line boundary
The load/store straddles a page boundary
Both of these require a substantial number of gates to fix up - particularly the latter - as the memory access in fact straddles both the cache line and a page boundary. Gates, die real estate and power consumption are at a premium in most ARM parts.
This behaviour is also entirely compatible with the C and C++ standards, so the code above is not universally portable.
Overlaying a struct or union over the buffer with unaligned accesses is not safe either - the compiler will lay it so as to avoid unaligned accesses and will also not initialise any gaps left (so never use memcmp to compare them). It is also precisely what you don't want for wire-format packets.
Using structure packing is safer - the compiler will know what memory accesses are permitted and which are not and will generate smaller reads and writes so as not to perform unaligned accesses. However, until recently, the mechanism for enabling packing has been compiler specific - so not portable either.
The only truly portable implementation choice is byte-wise accesses (and shifts to coalesce the data).

Structure padding

I was trying to understand why structure padding is the reason structures cannot be compared by memcmp.
One small thing i dont understand about structure padding is this...
why should "a short be 2 byte aligned"or"a long be 4 byte aligned". I understand it is with their sizes but why can they not appear at any byte boundary?
Or in other words "why is 0x10004566 not a valid location for a long variable but 0x10004568 is?"
Because some platforms (i.e. CPUs) physically don't support "mis-aligned" memory accesses. Other platforms support them, but in a much slower fashion.
The padding you get in a struct is dependent on the choices your compiler makes, but it will be making those choices in order to satisfy the specific requirements of the CPU the code is targeted at.
Memory alignment is a very important issue when optimizing a program for speed. C, being a language that - generally - puts strong emphasis on speed, likes to enforce some rules which may make the program faster.
The limitation of aligned and unaligned memory accesses comes directly from the hardware used for fetching the data from the memory, which usually fetches it in chunks which are equal to the machine word in size. Say you want to access a doubleword (4 bytes) stored at location 101. This means that the memory controller would firstly have to (probably) issue a read of a doubleword at location 100, then another read of a doubleword at location 104, and then splice the individual bytes from locations 101, 102, 103, and 104 together. The whole operation takes (hypothetically) two clock cycles.
If you want to access a doubleword at location 100, there's no such issue, which should be illustrated clearly enough by the example I provided.
In fact, misaligned data access is such a big issue that SSE instructions (the "aligned" versions, there are also "misaligned" versions which don't do that) will cause a general protection fault if you try to access misaligned data with those.
As a rule of thumb, it never hurts to align 4-byte data on a 4-byte boundary, 8-byte data on a 8-byte boundary, and so forth.
The only additional example I can think of with respect to alignment is transfer of data, transfers of data (depending on the architecture) goes in blocks of say 32 bytes for example, if your data crosses a boundary it could require 2 transfers to receive the data, rather then 1.

Resources