What other compilers do I need to worry about struct packing? - c

In GCC, I need to use __attribute__((packed)) to make structs take the least amount of space, for example, if I have a large array of structs I should pack them. What other common compilers do struct padding, and how do I pack structs in these other compilers?

The premise of your question is mistaken. Packed structs are not something you're "supposed to use" to save space. They're a dubious nonstandard feature with lots of problems that should be avoided whenever possible. (Ultimately it's always possible to avoid them, but some people balk at the tradeoffs involved.) For example, whenever you use packed structures, any use of pointers to members is potentially unsafe because the pointer value is not necessarily a valid (properly aligned) pointer to the type it points to. The only time there is a "need" for packed structures is when you're using them to access memory-mapped hardware registers that are misaligned, or to access in-file/on-disk data structures that are misaligned (but the latter won't be portable anyway since representation/endianness may not match, and both problems are solved together much better with a proper serialization/deserialization function).
If your goal is to save space, then as long as you control the definition of the structure, simply order it so as not to leave unnecessary padding space. This can be achieved simply by ordering the members in order of decreasing size; if you do that, then on any reasonable implementation, the wasted space will be at most the difference between the size of the largest member and the size of the smallest.

Related

Can C add padding between struct members, even if they are ordered in decreasing alignment?

struct Foo {
int a;
char b;
}
Will it be guaranteed in this case that b will have an offset of sizeof(int) in the struct? Will it be guaranteed that members will be packed together as long as all alignment requirements are being met, no padding required (Not taking into account the padding at the end to align the structures size to the largest member)?
I am asking this because I would like to know if simply fwrite()ing a struct to a save file can cause problems if the layout of a struct is not consistent across platforms, because then each save file would be specific to the platform on which it was created.
There are no guarantees. If a compiler wishes to insert unnecessary padding between structure members, it can do so. (For example, it might have determined that a particular member could be handled much more efficiently were it eight-byte aligned, even though it doesn't require alignment at all.)
Portable code intended for interoperability should not fwrite structs. Aside from the potential of differences in padding, not all platforms use the same endianness, nor the same floating point representation (if that's relevant).
Strictly speaking, the C standard makes no guarantees regarding padding inside of a struct, other than no padding at the beginning.
That being said, most implementations you're likely to come across tend to perform padding in a consistent manner (see The Lost Art of Structure Packing). In your particular example however there's a potential for inconsistency as an int is not necessarily the same size on all platforms.
If you used fixed width types such as int32_t and int8_t instead of int and char then you're probably OK. Adding a static_assert for the size of the struct can help enforce this.
You do however need to worry about endianness, converting each field to a known byte order before saving and back to the host byte order after reading back.

Is it better for performance to have alignment padding at the end of a small struct instead of between 2 members?

We know that there is padding in some structures in C. Please consider the following 2:
struct node1 {
int a;
int b;
char c;
};
struct node2 {
int a;
char c;
int b;
};
Assuming sizeof(int) = alignof(int) = 4 bytes:
sizeof(node1) = sizeof(node2) = 12, due to padding.
What is the performance difference between the two? (if any, w.r.t. the compiler or the architecture of the system, especially with GCC)
These are bad examples - in this case it doesn't matter, since the amount of padding will be the same in either case. There will not be any performance differences.
The compiler will always strive to fill up trailing padding at the end of a struct or otherwise using arrays of structs wouldn't be feasible, since the first member should always be aligned. If not for trailing padding in some item struct_array[0], then the first member in struct_array[1] would end up misaligned.
The order would matter if we were to do this though:
struct node3 {
int a;
char b;
int c;
char d;
};
Assuming 4 byte int and 4 byte alignment, then b occupies 1+3 bytes here, and d an additional 1+3 bytes. This could have been written better if the two char members were placed adjacently, in which case the total amount of padding would just have been 2 bytes.
I would not be surprised if the interviewer's opinion was based on the old argument of backward compatibility when extending the struct in the future. Additional fields (char, smallint) may benefit from the space occupied by the trailing padding, without the risk of affecting the memory offset of the existing fields.
In most cases, it's a moot point. The approach itself is likely to break compatibility, for two reasons:
Starting the extensions on a new alignment boundary (as would happen to node2) may not be memory-optimal, but it might well prevent the new fields from accidentally being overwritten by the padding of a 'legacy' struct.
When compatibility is that much of an issue (e.g. when persisting or transferring data), then it makes more sense to serialize/deserialize (even if binary is a requirement) than to depend on a binary format that varies per architecture, per compiler, even per compiler option.
OK, I might be completely off the mark here since this is a bit out of my league. If so, please correct me. But this is how I see it:
First of all, why do we need padding and alignment at all? It's just wasted bytes, isn't it? Well, turns out that processors like it. That is, if you issue an instruction to the CPU that operates on a 32-bit integer, the CPU will demand that this integer resides at a memory address which is dividable by 4. For a 64-bit integer it will need to reside in an address dividable by 8. And so on. This is done to make the CPU design simpler and better performant.
If you violate this requirement (aka "unaligned memory access"), most CPUs will raise an exception. x86 is actually an oddity because it will still perform the operation - but it will take more than twice as long because it will fetch the value from memory in two passes rather than one and then do bitwise magic to stick the value together from these separate accesses.
So this is the reason why compilers add padding to structs - so that all the members would be properly aligned and the CPU could access them quickly (or at all). Well, that's assuming the struct itself is located at a proper memory address. But it will also take care of that as long as you stick to standard operations for allocating the memory.
But it is possible to explicitly tell the compiler that you want a different alignment too. For example, if you want to use your struct to read in a bunch of data from a tightly packed file, you could explicitly set the padding to 1. In that case the compiler will also have to emit extra instructions to compensate for potential misalignment.
TL;DR - wrong alignment makes everything slower (or under certain conditions can crash your program entirely).
However this doesn't answer the question "where to better put the padding?" Padding is needed, yes, but where? Well, it doesn't make much difference directly, however by rearranging your members carefully you can reduce the size of the entire struct. And less memory used usually means a faster program. Especially if you create large arrays of these structs, using less memory will mean less memory accesses and more efficient use of CPU cache.
In your example however I don't think there's any difference.
P.S. Why does your struct end with a padding? Because arrays. The compiler wants to make sure that if you allocate an array of these structs, they will all be properly aligned. Because array members don't have any padding between them.
What is the performance difference between the two?
The performance difference is "indeterminable". For most cases it won't make any difference.
For cases where it does make a difference; either version might be faster, depending on how the structure is used. For one example, if you have a large array of these structures and frequently select a structure in the array "randomly"; then if you only access a and b of the randomly selected structure the first version can be faster (because a and b are more likely to be in the same cache line), and if you only access a and c then the second version can be faster.

C memcpy byte buffer to packed struct, good decision?

I am aware that type casting a buffer to struct is violating strict aliasing rule and that it's not portable
However is memcpy() a buffer to struct with attribute packed not voiding that rule, is it a good decision rather then parsing the contents of the buffer ? Lets keep in mind that both always have a fixed size
If you have ensured that the packed structure lays out the bytes as desired on all the target platforms you wish to support, then copying the bytes into the structure via memcpy and then accessing them through the structure members is fine.
Depending on circumstances, it may be advisable to copy the structure members to a normal (not packed) structure for further use, so that unaligned members in the packed structure are not repeatedly accessed, which may be inefficient. Ultimately, this may be equivalent to issuing multiple memcpy calls to copy bytes from the buffer into individual members of the normal structure.
Using memcpy is certainly at least as efficient as parsing the buffer, as memcpy is about the simplest thing one can do with data. But whether it is more efficient or just the same depends on what sort of parsing you would be doing. Once you have the data in a structure, you will still have to operate on it in whatever ways your application requires anyway, so the memcpy would not seem to eliminate any real work that must be done.

Explanation of packed attribute in C

I was wondering if anyone could offer a more full explanation to the meaning of the packed attribute used in the bitmap example in pset4.
"Our use, incidentally, of the attribute called packed ensures that clang does not try to "word-align" members (whereby the address of each member’s first byte is a multiple of 4), lest we end up with "gaps" in our structs that don’t actually exist on disk."
I do not understand the comment around gaps in our structs. Does this refer to gaps in the memory location between each struct (i.e. one byte between each 3 byte RGB if it was to word-algin)? Why does this matter in for optimization?
typedef uint8_t BYTE;
typedef struct
{
BYTE rgbtBlue;
BYTE rgbtGreen;
BYTE rgbtRed;
} __attribute__((__packed__))
RGBTRIPLE;
Beware: prejudices on display!
As noted in comments, when the compiler adds the padding to a structure, it does so to improve performance. It uses the alignments for the structure elements that will give the best performance.
Not so very long ago, the DEC Alpha chips would handle a 'unaligned memory request' (umr) by doing a page fault, jumping into the kernel, fiddling with the bytes to get the required result, and returning the correct result. This was painfully slow by comparison with a correctly aligned memory request; you avoided such behaviour at all costs.
Other RISC chips (used to) give you a SIGBUS error if you do misaligned memory accesses. Even Intel chips have to do some fancy footwork to deal with misaligned memory accesses.
The purpose of removing padding is to (decrease performance but) benefit by being able to serialize and unserialize the data without doing the job 'properly' — it is a form of laziness that actually doesn't work properly when the machines communicating are not of the same type, so proper serialization should have been done in the first place.
What I mean is that if you are writing data over the network, it seems simpler to be able to send the data by writing the contents of a structure as a block of memory (error checking etc omitted):
write(fd, &structure, sizeof(structure));
The receiving end can read the data:
read(fd, &structure, sizeof(structure));
However, if the machines are of different types (for example, one has an Intel CPU and the other a SPARC or Power CPU), the interpretation of the data in those structures will vary between the two machines (unless every element of the array is either a char or an array of char). To relay the information reliably, you have to agree on a byte order (e.g. network byte order — this is very much a factor in TCP/IP networking, for example), and the data should be transmitted in the agreed upon order so that both ends can understand what the other is saying.
You can define other mechanisms: you could use a 'sender makes right' mechanism, in which the 'receiver' let's the sender know how it wants the data presented and the sender is responsible for fixing up the transmitted data. You can also use a 'receiver makes right' mechanism which works the other way around. Both these have been used commercially — see DRDA for one such protocol.
Given that the type of BYTE is uint8_t, there won't be any padding in the structure in any sane (commercially viable) compiler. IMO, the precaution is a fantasy or phobia without a basis in reality. I'd certainly need a carefully documented counter-example to believe that there's an actual problem that the attribute helps with.
I was led to believe that you could encounter issues when you pass the entire struct to a function like fread as it assumes you're giving it an array like chunk of memory, with no gaps in it. If your struct has gaps, the first byte ends up in the right place, but the next two bytes get written in the gap, which you don't have a proper way to access.
Sorta...but mostly no. The issue is that the values in the padding bytes are indeterminate. However, in the structure shown, there will be no padding in any compiler I've come across; the structure will be 3 bytes long. There is no reason to put any padding anywhere inside the structure (between elements) or after the last element (and the standard prohibits padding before the first element). So, in this context, there is no issue.
If you write binary data to a file and it has holes in it, then you get arbitrary byte values written where the holes are. If you read back on the same (type of) machine, there won't actually be a problem. If you read back on a different (type of) machine, there may be problems — hence my comments about serialization and deserialization. I've only been programming in C a little over 30 years; I've never needed packed, and don't expect to. (And yes, I've dealt with serialization and deserialization using a standard layout — the system I mainly worked on used big-endian data transfer, which corresponds to network byte order.)
Sometimes, the elements of a struct are simply aligned to a 4-byte boundary (or whatever the size of a register is in the CPU) to optimize read/write access to RAM. Often, smaller elements are packed together, but alignment is dictated by a larger type in the struct.
In your case, you probably don't need to pack the struct, but it doesn't hurt.
With some compilers, each byte in your struct could end up taking 4 bytes of RAM each (so, 12 bytes for the entire struct). Packing the struct removes the alignment requirement for each of the BYTEs, and ensures that the entire struct is placed into one 4-byte DWORD (unless the alignment for the entire program is set to one byte, or the struct is in an array of said structs, in that case it would literally be stored in 3 contiguous bytes of RAM).
See comments below for further discussion...
The objective is exactly what you said, not having gaps between each struct. Why is this important? Mostly because of cache. Memory access is slow!!! Cache is really fast. If you can fit more in cache you avoid cache misses (memory accesses).
Edit: Seems I was wrong, didn't seem really useful if the objective was structure padding since the struct has 3 BYTE

How to avoid structure padding?

I'm curious to know about how #pragma will help to avoid stucture padding (please give me one programme to understand it).
By default compilers will allocate memory in an aligned manner. So by avoiding structure padding what will be the benifit programmer will get?
When is it neccessary to avoid structure padding?
The techniques depend on compiler.
The benefits are usually not much, other than potentially reducing the amount of memory consumed by your program. That benefit is only worth while on machines with few resources (e.g. memory) which means it is rarely needed with modern hardware.
In practice, the cost is reduced performance or hardware exceptions. The common purposes of padding are performance and avoiding hardware exceptions, by aligning struct members in a way that suits the host system. Disallowing padding basically turns off all the benefits of padding.
Saving a few bytes, or even a few kilobytes, is rarely worth the impact in terms of performance or more error conditions. If you are doing certain types of development (e.g. on embedded system with limited resources) it might be worthwhile, but even then not always.
1.Structure Padding is avoided mostly in case of resource critical embedded systems.In this case RAM is saved by packing the structure members on the expense of code memory(More Instructions are needed to access the packed structure member).
In some cases an array of bytes is mapped to structure members for ease of accessing.
structure paddding can be avoided by using a preprocessor directive #pragma 1
the compiler will alocate memory in multiples of one

Resources