memory alignment within gcc structs

memory alignment within gcc structs - c

I am porting an application to an ARM platform in C, the application also runs on an x86 processor, and must be backward compatible.
I am now having some issues with variable alignment. I have read the gcc manual for
__attribute__((aligned(4),packed)) I interpret what is being said as the start of the struct is aligned to the 4 byte boundry and the inside remains untouched because of the packed statement.
originally I had this but occasionally it gets placed unaligned with the 4 byte boundary.
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((packed)) CHALLENGE;
so I change it to this.
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((aligned(4),packed)) CHALLENGE;
The understand I stated earlier seems to be incorrect as both the struct is now aligned to a 4 byte boundary, and and the inside data is now aligned to a four byte boundary, but because of the endianess, the size of the struct has increased in size from 42 to 44 bytes. This size is critical as we have other applications that depend on the struct being 42 bytes.
Could some describe to me how to perform the operation that I require. Any help is much appreciated.

If you're depending on sizeof(yourstruct) being 42 bytes, you're about to be bitten by a world of non-portable assumptions. You haven't said what this is for, but it seems likely that the endianness of the struct contents matters as well, so you may also have a mismatch with the x86 there too.
In this situation I think the only sure-fire way to cope is to use unsigned char[42] in the parts where it matters. Start by writing a precise specification of exactly what fields are where in this 42-byte block, and what endian, then use that definition to write some code to translate between that and a struct you can interact with. The code will likely be either all-at-once serialisation code (aka marshalling), or a bunch of getters and setters.

This is one reason why reading whole structs instead of memberwise fails, and should be avoided.
In this case, packing plus aligning at 4 means there will be two bytes of padding. This happens because the size must be compatible for storing the type in an array with all items still aligned at 4.
I imagine you have something like:
read(fd, &obj, sizeof obj)
Because you don't want to read those 2 padding bytes which belong to different data, you have to specify the size explicitly:
read(fd, &obj, 42)
Which you can keep maintainable:
typedef struct {
//...
enum { read_size = 42 };
} __attribute__((aligned(4),packed)) CHALLENGE;
// ...
read(fd, &obj, obj.read_size)
Or, if you can't use some features of C++ in your C:
typedef struct {
//...
} __attribute__((aligned(4),packed)) CHALLENGE;
enum { CHALLENGE_read_size = 42 };
// ...
read(fd, &obj, CHALLENGE_read_size)
At the next refactoring opportunity, I would strongly suggest you start reading each member individually, which can easily be encapsulated within a function.

I've been moving structures back and forth from Linux, Windows, Mac, C, Swift, Assembly, etc.
The problem is NOT that it can't be done, the problem is that you can't be lazy and must understand your tools.
I don't see why you can't use:
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((packed)) CHALLENGE;
You can use it and it doesn't require any special or clever code. I write a LOT of code that communicates to ARM. Structures are what make things work. __attribute__ ((packed)) is my friend.
The odds of being in a "world of hurt" are nil if you understand what is going on with both.
Finally, I can't for the life make out how you get 42 or 44. Int is either 4 or 8 bytes (depending on the compiler). That puts the number at either 16+16+2=34 or 32+16+2=50 -- assuming it is truly packed.
As I say, knowing your tools is part of your problem.

What is your true goal?
If it's to deal with data that's in a file or on the wire in a particular format what you should do is write up some marshaling/serialization routines that move the data between the compiler struct that represents how you want to deal with the data inside the program and a char array that deals with how the data looks on the wire/file.
Then all that needs to be dealt with carefully and possibly have platform specific code is the marshaling routines. And you can write some nice-n-nasty unit tests to ensure that the marshaled data gets to and from the struct properly no matter what platform you might have to port to today and in the future.

I would guess that the problem is that 42 isn't divisible by 4, and so they get out of alignment if you put several of these structs back to back (e.g. allocate memory for several of them, determining the size with sizeof). Having the size as 44 forces the alignment in these cases as you requested. However, if the internal offset of each struct member remains the same, you can treat the 44 byte struct as though it was 42 bytes (as long as you take care to align any following data at the correct boundary).
One trick to try might be putting both of these structs inside a single union type and only use 42-byte version from within each such union.

As I am using linux, I have found that by echo 3 > /proc/cpu/alignment it will issue me with a warning, and fix the alignment issue. This is a work around but it is very helpful with locating where the structures are failing to be misaligned.

Related

How can size of a structure be a non-multiple of 4?

I'm new to structures and was learning how to find the size of structures. I'm aware of how padding comes in to play in order to properly align the memory. From what I've understood, the alignment is done so that the size in memory comes out to be a multiple of 4.
I tried the following piece of code on GCC.
struct books{
short int number;
char name[3];
}book;
printf("%lu",sizeof(book));
Initially I had thought that the short int would occupy 2 bytes, followed by the character array starting at the third memory location from the beginning. The character array then would need a padding of 3 bytes which would give a size of 8. Something like this, where each word represents a byte in memory.
short short char char
char padding padding padding
However on running it gives a size of 6, which confuses me.
Any help would be appreciated, thanks!

Generally, padding is inserted to allow for aligned access of the internal elements of the structure, not to allow the entire structure to be a size of multiple words. Alignment is a compiler implementation issue, not a requirement of the C standard.
So, the char elements which are 3 bytes in length, need no alignment because they are byte elements.
It is preferred, though not required, that the short element needs to be aligned on a short boundary -- which means an even address. By aligning it on a short boundary, the compiler can issue a single load short instruction rather than having to load a word, mask, and then shift.
In this case, the padding is probably, but not necessarily, happening at the end rather than in the middle. You will have to write code to dump the address of the elements to determine where padding is taking place.
EDIT: . As #Euguen Sh mentions, even if you discover the padding scheme that the compiler is using for the structure, the compiler could modify that in a different version of the compiler.
It is unwise to count on the padding scheme of the compiler. There are always methods to access the elements in such a way that you do not guess at alignments.
The sizeof() operator is used to allow you to see how much memory is used AND to know how much will be added to a ptr to the structure if that pointer is incremented by 1 (ptr++).
EDIT 2, Packing: Structures may be packed to prevent padding using the __packed__ attribute. When designing a structure, it is wise to use elements that naturally pack. This is especially important when sending data over a communications link. A carefully designed structure avoids the need for padding in the middle of the strucuture. A poorly designed structure which is then compiled with the __packed__ attribute may have internal elements that are not naturally aligned. One might do this to ensure that the structure will transmit across a wire as it was originally designed. This type of effort has diminished with the introduction of JSON for transmission of data over a wire.

#include <stdalign.h>
#include <assert.h>
The size of a struct is always divisible by the maximum alignment of the members (which must be a power of two).
If you have a struct with char and short the alignment is 2, because the alignment of short is two, if you have a struct, only out of chars it has an alignment of 1.
There are multiple ways to manipulate the alignment:
alignas(4) char[4]; // this can hold 32-bit ints
This is nonstandart, but available in most compilers (GCC, Clang, ...):
struct A {
char a;
short b;
};
struct __attribute__((packed)) B {
char a;
short b;
};
static_assert(sizeof(struct A) == 4);
static_assert(alignof(struct A) == 2);
static_assert(sizeof(struct B) == 3);
static_assert(alignof(struct B) == 1);

Usually compilers follow ABI of the target architecture.
It defines alignments of structures and primitive datatypes. And that affects to needed padding and sizes of structures. Because alignment is multiple of 4 in many architectures, size of structures are too.
Compilers may offer some attributes/options for changing alignments more or less directly.
For example gcc and clang offers: __attribute__ ((packed))

Should all structs that are expected to be read from binary be marked as packed?

I know that some structs, may, or may not, add padding between elements.
My current project is reading input from /dev/input files. The binary layout of these files is defined in <linux/input.h>:
struct input_event {
struct timeval time;
__u16 type;
__u16 code;
__s32 value;
};
I wonder though that this struct is not marked with a packed attribute. Meaning that the /dev/input files (which are packed bit by bit) are not guaranteed to match the same package as the struct. Thus the logic
struct input_event event;
read(fd, &event, sizeof(event));
Is not defined to work across all archs.
Do I have a fallacy in my logic? Or is it safe to assume that somethings are not going to be packed?

Packing by Layout
In the current case, you are safe. Your struct input_event is allready layed out packed.
struct timeval time; /* 8 bytes */
__u16 type; /* 2 bytes */
__u16 code; /* 2 bytes */
__s32 value; /* 4 bytes */
This means, the members form clean 32Bit blocks and so there is no padding required. This article explains how the size of struct members (especially chars) and their layout affect the padding and thus also the final size of a struct.
Packing by the Preprocessor
Packing structs via the preprocessor seems to be a good solution at a first sight. Looking a little closer, there show up several downsides and one of those hits you at the point you are caring about (see also #pragma pack effect)
Performance
Padding assures that your struct members are accessible without searching inside memoryblocks (4byte blocks on a 32bit machine and 8byte blocks on a 64bit machine respectively). In consequence, packing such structures leads to members spanning over multiple memoryblocks and therefore require the machine to search for them.
Different Platforms (Archs, Compilers)
Preprocessor instructions are heavily vendor and architecture specific. Thus, using them leads to lesser or in worst case non-portability of your code.
Conclusion
As the author of this article (already mentioned above) states, even NTP directly reads data from the network into structures.
So, carefully laying out your structures and maybe padding them by hand might be the safest and also most portable solution.

If you insist on directly loading structures into memory from binary images, then C is not your friend. Padding is allowed, and the basic types can have different widths and endiannness. You are not even guaranteed 8 bits in a byte. However packing the structures and sticking to int32_t and so on will help a great deal, it's effectively portable bar endiannness.
But the best solution is to load the structure from the stream portably. This is even possible with real numbers, though a bit fiddly.
This is how to read a 16 bit integer portably. See my github project for the rest of the functions (similar logic)
https://github.com/MalcolmMcLean/ieee754
/**
Get a 16-bit big-endian signed integer from a stream.
Does not break, regardless of host integer representation.
#param[in] fp - pointer to a stream opened for reading in binary mode
# returns the 16 bit value as an integer
*/
int fget16be(FILE *fp)
{
int c1, c2;
c2 = fgetc(fp);
c1 = fgetc(fp);
return ((c2 ^ 128) - 128) * 256 + c1;
}

How to tell gcc to disable padding inside struct? [duplicate]

This question already has answers here:
memory alignment within gcc structs
(6 answers)
Closed 6 years ago.
I’m unsure on whether it’s normal or it’s a compiler bug but I have a C struct with lot of members. Among of them, there’s, :
struct list {
...
...
const unsigned char nop=0x90; // 27 bytes since the begining of the structure
const unsigned char jump=0xeb; // 28 bytes since the begining of the structure
const unsigned char hlt=0xf4; // 29 bytes since the begining of the structure
unsigned __int128 i=0xeb90eb90eb90eb90f4f4 // should start at the 30th byte, but get aligned on a 16 byte boundary and starts on the 32th byte instead
const unsigned char data=0x66; // should start at the 46th byte, but start on the 48th instead.
}; // end of struct list.
I had a hard time to find out why my program wasn’t working, but I finally found there’s a 2 bytes gap between hltand i which is set to 0x0. This means that the i is getting aligned.
This is very clear when I printf that part of the structure, because with :
for(int i=28;i<35;i++)
printf("%02hhX",buf[i]);
I get EBF40000EB90EB90 on the screen.
I tried things like volatile struct list data;in my program, but it didn’t changed the alignment problem.
So is there a #pragma or a __attribute__to tell gcc to not align i inside struct listtype ?

In GCC you can use __attribute__((packed)) like this:
// sizeof(x) == 8
struct x
{
char x;
int a;
};
// sizeof(y) == 5
struct y
{
char x;
int a;
} __attribute__((packed));
See doc.
Also if you rely on the addresses of struct fields, take a look at the offsetof macro. Maybe you don't need to pack the structure at all.

As touched on by #Banex
#pragma pack(push,1)
struct
{
char a;
int b;
long long c;
} foo;
#pragma pack(pop)
The #pragma pack(push,1) pushes the current packing mode internally, and sets packing to 1, no padding
The #pragma pack(pop) restores the previous packing
Supposedly compatible with Microsoft's syntax
http://gcc.gnu.org/onlinedocs/gcc-4.4.4/gcc/Structure_002dPacking-Pragmas.html

The fields within the struct are padded in an implementation defined manner.
That being said, fields are typically aligned on an offest which is a multiple of the size of the data member (or array element if the member is an array) in question. So a 16 bit field starts on a 2 byte offset, 32 bit field starts on a 4 byte offset, and so forth.
If you reorder the fields in your struct to adhere to this guideline, you can typically avoid having any internal padding within the struct (although you may end up with some trailing padding).
By putting the fields at the proper offset, there can be performance gains over forcefully packing the struct.
For more details, see this article on structure packing.
While using the above techniques are not guaranteed, they tend to work in most cases.

Packing a C Struct [duplicate]

I am porting an application to an ARM platform in C, the application also runs on an x86 processor, and must be backward compatible.
I am now having some issues with variable alignment. I have read the gcc manual for
__attribute__((aligned(4),packed)) I interpret what is being said as the start of the struct is aligned to the 4 byte boundry and the inside remains untouched because of the packed statement.
originally I had this but occasionally it gets placed unaligned with the 4 byte boundary.
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((packed)) CHALLENGE;
so I change it to this.
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((aligned(4),packed)) CHALLENGE;
The understand I stated earlier seems to be incorrect as both the struct is now aligned to a 4 byte boundary, and and the inside data is now aligned to a four byte boundary, but because of the endianess, the size of the struct has increased in size from 42 to 44 bytes. This size is critical as we have other applications that depend on the struct being 42 bytes.
Could some describe to me how to perform the operation that I require. Any help is much appreciated.

If you're depending on sizeof(yourstruct) being 42 bytes, you're about to be bitten by a world of non-portable assumptions. You haven't said what this is for, but it seems likely that the endianness of the struct contents matters as well, so you may also have a mismatch with the x86 there too.
In this situation I think the only sure-fire way to cope is to use unsigned char[42] in the parts where it matters. Start by writing a precise specification of exactly what fields are where in this 42-byte block, and what endian, then use that definition to write some code to translate between that and a struct you can interact with. The code will likely be either all-at-once serialisation code (aka marshalling), or a bunch of getters and setters.

This is one reason why reading whole structs instead of memberwise fails, and should be avoided.
In this case, packing plus aligning at 4 means there will be two bytes of padding. This happens because the size must be compatible for storing the type in an array with all items still aligned at 4.
I imagine you have something like:
read(fd, &obj, sizeof obj)
Because you don't want to read those 2 padding bytes which belong to different data, you have to specify the size explicitly:
read(fd, &obj, 42)
Which you can keep maintainable:
typedef struct {
//...
enum { read_size = 42 };
} __attribute__((aligned(4),packed)) CHALLENGE;
// ...
read(fd, &obj, obj.read_size)
Or, if you can't use some features of C++ in your C:
typedef struct {
//...
} __attribute__((aligned(4),packed)) CHALLENGE;
enum { CHALLENGE_read_size = 42 };
// ...
read(fd, &obj, CHALLENGE_read_size)
At the next refactoring opportunity, I would strongly suggest you start reading each member individually, which can easily be encapsulated within a function.

I've been moving structures back and forth from Linux, Windows, Mac, C, Swift, Assembly, etc.
The problem is NOT that it can't be done, the problem is that you can't be lazy and must understand your tools.
I don't see why you can't use:
typedef struct
{
unsigned int code;
unsigned int length;
unsigned int seq;
unsigned int request;
unsigned char nonce[16];
unsigned short crc;
} __attribute__((packed)) CHALLENGE;
You can use it and it doesn't require any special or clever code. I write a LOT of code that communicates to ARM. Structures are what make things work. __attribute__ ((packed)) is my friend.
The odds of being in a "world of hurt" are nil if you understand what is going on with both.
Finally, I can't for the life make out how you get 42 or 44. Int is either 4 or 8 bytes (depending on the compiler). That puts the number at either 16+16+2=34 or 32+16+2=50 -- assuming it is truly packed.
As I say, knowing your tools is part of your problem.

What is your true goal?
If it's to deal with data that's in a file or on the wire in a particular format what you should do is write up some marshaling/serialization routines that move the data between the compiler struct that represents how you want to deal with the data inside the program and a char array that deals with how the data looks on the wire/file.
Then all that needs to be dealt with carefully and possibly have platform specific code is the marshaling routines. And you can write some nice-n-nasty unit tests to ensure that the marshaled data gets to and from the struct properly no matter what platform you might have to port to today and in the future.

I would guess that the problem is that 42 isn't divisible by 4, and so they get out of alignment if you put several of these structs back to back (e.g. allocate memory for several of them, determining the size with sizeof). Having the size as 44 forces the alignment in these cases as you requested. However, if the internal offset of each struct member remains the same, you can treat the 44 byte struct as though it was 42 bytes (as long as you take care to align any following data at the correct boundary).
One trick to try might be putting both of these structs inside a single union type and only use 42-byte version from within each such union.

As I am using linux, I have found that by echo 3 > /proc/cpu/alignment it will issue me with a warning, and fix the alignment issue. This is a work around but it is very helpful with locating where the structures are failing to be misaligned.

How to get aligned array of packed structs in GCC C?

In GCC C, how can I get an array of packed structs, where each entry in the array is aligned?
(FWIW, this is on a PIC32, using the MIPS4000 architecture).
I have this (simplified):
typedef struct __attribute__((packed))
{
uint8_t frameLength;
unsigned frameType :3;
uint8_t payloadBytes;
uint8_t payload[RADIO_RX_PAYLOAD_WORST];
} RADIO_PACKET;
RADIO_PACKETs are packed internally.
Then I have RADIO_PACKET_QUEUE, which is a queue of RADIO_PACKETs:
typedef struct
{
short read; // next buffer to read
short write; // next buffer to write
short count; // number of packets stored in q
short buffers; // number of buffers allocated in q
RADIO_PACKET q[RADIO_RX_PACKET_BUFFERS];
} RADIO_PACKET_QUEUE;
I want each RADIO_PACKET in the array q[] to start at an aligned address (modulo 4 address).
But right now GCC doesn't align them, so I get address exceptions when trying to read q[n] as a word. For example, this gives an exception:
RADIO_PACKET_QUEUE rpq;
int foo = *(int*) &(rpq.q[1]);
Perhaps this is because of the way I declared RADIO_PACKET as packed.
I want each RADIO_PACKET to remain packed internally, but want GCC to add padding as needed after each array element so each RADIO_PACKET starts at an aligned address.
How can I do this?

Since you specify that you are using GCC, you should be looking at type attributes. In particular, if you want RADIO_PACKETs to be aligned on 4-byte (or wider) boundaries, then you would use __attribute__((aligned (4))) on the type. When applied to a struct, it describes the alignment of instances of the overall struct, not (directly) the alignment of any individual members, so it is possible to use it together with attribute packed:
typedef struct __attribute__((aligned(4), packed))
{
uint8_t frameLength;
unsigned frameType :3;
uint8_t payloadBytes;
uint8_t payload[RADIO_RX_PAYLOAD_WORST];
} RADIO_PACKET;
The packed attribute prevents padding between structure elements, but it does not prevent trailing padding in the structure representation, exactly as is necessary for ensuring the required alignment for every element of an array of the specified type. You should not then need to do anything special in the declaration of RADIO_PACKET_QUEUE.
This is a lot cleaner and clearer than the alternative you came up with, but it is GCC-specific. Since you were GCC-specific already, I don't see that being a problem.

You can wrap your packed structure, inside another unaligned structure. Then you do array from this unaligned structures.
Solution 2 could be, to add dummy member char[] at the end of the packed structure. In this case you will need to calculate it somehow, probably manually.
I also will suggest you to rearrange your structure, by placing longer members first and placing uint8_t members last (assuming you have 16/32 bit members and not doing some HW mapping).

I think I've solved this, based on the hint from #Nick in the question comments.
I added a RADIO_PACKET_ALIGNED wrapper around RADIO_PACKET. This includes calculated padding.
Then I substituted RADIO_PACKET_ALIGNED for RADIO_PACKET in the RADIO_PACKET_QUEUE structure.
Seems to work:
typedef struct
{
RADIO_PACKET packet;
uint8_t padding[3 - (sizeof(RADIO_PACKET) + 3) % 4];
} RADIO_PACKET_ALIGNED;
typedef struct
{
short read; // next buffer to read
short write; // next buffer to write
short count; // number of packets stored in q
short buffers; // number of buffers allocated in q
RADIO_PACKET_ALIGNED q[RADIO_RX_PACKET_BUFFERS];
} RADIO_PACKET_QUEUE;
Thanks to all the commenters!
Edit: A more portable version of the wrapper would use:
uint8_t padding[(sizeof(int) - 1) - (sizeof(RADIO_PACKET) + (sizeof(int) - 1)) % sizeof(int)];

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight