64-bit compiler struct padding with 32-bit arguments - c

I've been trying to understand how 64-bit compiler enforces struct alignment, and I can't understand why there is no padding in case a struct has only 32-bit arguments.
I would expect to have padding even in that case as 64-bit CPUs access memory using 64-bit pointers, don't they?
typedef struct {
uint32_t a1;
uint32_t a2;
uint32_t a3;
}tHeader;
typedef struct{
tHeader header;
uint32_t data1;
uint32_t data2;
}tPacket1;
0 7 bytes
+-------+-------+
| a1 a2 |
+-------+-------+
+-------+-------+
| a3 data1 |
+-------+-------+
+-------+-------+
| data2 | <---- Why no padding here?
+-------+-------+
20 bytes.
Padding example when 64-bit argument is present:
typedef struct {
uint32_t a1;
uint32_t a2;
uint32_t a3;
uint64_t a4;
}tHeader;
typedef struct{
tHeader header;
uint32_t data;
}tPacket1;
0 7 bytes
+-------+-------+
| a1 a2 |
+-------+-------+
+-------+-------+
| a3 PADDING|
+-------+-------+
+-------+-------+
| a4 |
+-------+-------+
+-------+-------+
| data PADDING| <---- Why padding here?
+-------+-------+
Total: 8 * 4 = 32 bytes.
Tested with:
$ gcc --version
gcc (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0

No padding is needed in the first case because all the members have 4-byte alignment. So if you have two consecutive structures, it can be laid out like:
0 7 bytes
+-------+-------+
| a1 a2 |
+-------+-------+
+-------+-------+
| a3 data1 |
+-------+-------+
+-------+-------+
| data2 a1 |
+-------+-------+
+-------+-------+
| a2 a3 |
+-------+-------+
+-------+-------+
| data1 data2 |
+-------+-------+
But that won't work in the second example because a4 needs 8-byte alignment. If it omitted the padding at the end, you'd have this:
0 7 bytes
+-------+-------+
| a1 a2 |
+-------+-------+
+-------+-------+
| a3 PADDING|
+-------+-------+
+-------+-------+
| a4 |
+-------+-------+
+-------+-------+
| data a1|
+-------+-------+
+-------+-------+
| a2 a3 |
+-------+-------+
+-------+-------+
|PADDING a4 |
+-------+-------+
+-------+-------+
| a4 data |
+-------+-------+
But splitting the second a4 like that is not permitted.
You could use the packed attribute to force it. Then the 64-bit element would be accessed using multiple 32-bit instructions. This would also obviate the padding between a3 and a4.

The idea of padding is to optimize the accesses and not capricious. 64bit cpus can make 64bit memory bus accesses, but there's no benefit of making a 64bit access when the bus transfer is done for a 32 bit quantity. When the cpu wants to transfer a 32bit quantity, it selects a 64bit address to select the 64bit register, and only reads the bytes corresponding to the quantity you selected. The compiler pads a structure, when you have a 64bit field that is not at a multiple of 8 address. In case you have to access it, the cpu should have to make two 64bit accesses to load/store the data if the data is not aligned to a multiple of 8 address. For this reason is done alignment. In case you have a structure that has only 32 bit addresses, accessing those registers only requires 32 bit bus accesses, so only 32bit alignment is required for the whole structure.
Let's go to the extreme... imagine you have a char register. Should you do a 64bit bus access and require the char to be aligned to a 64bit address to transfer only one byte of data?
In order to calculate alignment you have to think:
What is the offset of free space i got from the last field in the structure?
What is the alignment required by the next field?
Let's assume the last field packed in the structure left an offset of 13 bytes (you used a char[13] type) and imagine that the next field to pack is a char: A char requires one byte, so it can be appended to the last field without any alignment (it has alignment of one). Now assume it is a short (two bytes), the compiler should put it in the next even address (offset 14, with one byte of padding, and this will allow the structure to require an even address to be aligned (or the short field would not be aligned) In case it is a 32bit int, then the offset for the next field would require a multiple of 4 address, and 3 bytes of padding should be inserted, and so the offset should be at 16, with a 4byte alignment requirement for the whole structure. In case the data is a 64bit integer, the offset should have been fixed at an 8 byte boundary, making the compiler to insert 3 bytes of padding to make it at offset 16, and the alignment requirement for the whole structure would be 16 bytes. When no more fields are left to complete the structure, there's still some padding to make the whole structure to fit an integer number of bytes equal to the next multiple of the required alignment of the structure, so in case you append two structures, as in an array of structures of this type, the alignment is conserved. So in case we add at the end of the structure proposed a final 3 char array, we'll need to add a pad byte to fill the 4byte alignment of the whole structure.
So, once said this, the padding is computed based on the alignment requirements of the next field, based on the alignment required by the field (if it is an array the alignment required is the same as for the individual cell type of the array, and if it is a simple type it is the size of the type itself, and for structures the base type should be the alignment requirements of the next field with the biggest alignment requirements of the structure)
In the case you shown, only 32bit integers are included, so no padding is used between them (if the structue is aligned to 32bit, all fields will be aligned to 32bit) and no padding is needed on the end of the structure (to maintain the alignment in case of an array of structures)

Related

Creating a drawing of a stack for C with proper addresses and data

I am trying to draw a stack where the bottom of the process is 0xffff. The program is simple:
int main() {
char c;
int i;
double d;
int iArr[4];
return 0;
}
When I created a program for a sample of what the addresses could look like, I got:
iArr[0]: 0x7ffce0c79970
iArr[1]: 0x7ffce0c79974
iArr[2]: 0x7ffce0c79978
iArr[3]: 0x7ffce0c7997c
d: 0x7ffce0c79980
i: 0x7ffce0c79988
c: 0x7ffce0c7998f
The sizeof() function says that integers are 4 bytes, so why does i go from 88 to 88e? Also, if the bottom of the process is assumed to be 0xffff, would c start at 0xffff or 0xfffe?
You're going to need to consider alignment (alignof) as well as size (sizeof) because certain types must be aligned on specific memory addresses or the CPU can't deal.
For example, int must be on an address that's a multiple of 4 bytes, double on a multiple of 8, etc. while char being just one byte can go anywhere.
If you visualize the stack you see it as this:
| | | | | | | | |
+----+----+----+----+----+----+----+----+
...9970 | iArr[0] | iArr[1] |
+----+----+----+----+----+----+----+----+
...9978 | iArr[2] | iArr[3] |
+----+----+----+----+----+----+----+----+
...9980 | d |
+----+----+----+----+----+----+----+----+
...9988 | i | | | | c |
+----+----+----+----+----+----+----+----+
Which makes sense since the stack tends to grow down, that is earlier entries have higher memory addresses. So c gets the highest address, then i gets the next possible highest address, accounting for alignment, and so on. The iArr array is allocated as a contiguous chunk of memory at the highest possible alignment, but the indexes work in reverse order to the stack, they always count up, so that looks strange but makes sense too.

Mapping a number to bit position in C

I'm developing an programm running on Atmel AT90CAN128. Connected to this controller there are 40 devices, each with a status (on/off). As I need to report the status of each of this devices to a PC through Serial Communication, I have 40 bits, which define whether the device is on or off. In addition, the PC can turn any of this devices on or off.
So, my first attempt was to create the following struct:
typedef struct {
unsigned char length; //!< Data Length
unsigned data_type; //!< Data type
unsigned char data[5]; //!< CAN data array 5 * 8 = 40 bits
} SERIAL_packet;
The problem with this was that the PC will send an unsigned char address telling me the device to turn on/off, so accessing the bit corresponding to that address number turned out to be rather complicated...
So I started looking for options, and I stumbled upon the C99 _Bool type. I thought, great so now I'll just create a _Bool data[40] and I can access the address bit just by indexing my data array. Turns out that in C (or C++) memory mapping needs an entire byte for addressing it. So even if I declare a _Bool the size of that _Bool will be 8 bits which is a problem (it needs to be as fast as possible so the more bits I send the slower it gets, and the PC will be specting 40 bits only) and not very efficient for the communication. So I started looking into Bit Fields, and tried the following:
typedef struct {
unsigned char length; //!< Data Length
unsigned data_type; //!< Data type
arrayData data[40]; //!< Data array 5 bytes == 40 bits
} SERIAL_packet;
typedef struct {
unsigned char aux : 1;
} arrayData;
And I wonder, is this going to map that data[40] into a consequent memory block with a size of 40 bits (5 bytes)?
If not, is there any obvious solution I'm missing? This doesn't seem like a very complicated thing to do (would be much simpler if there were less than 32 devices so I could use a int and just access through a bit mask).
Assuming the addresses you get back are in the range 0 - 39 and that a char has 8 bits, you can treat your data array as an array of bits:
| data[0] | data[1] ...
-----------------------------------------------------------------
| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10| 11| 12| 13| 14| 15|
-----------------------------------------------------------------
To set bit i:
packet.data[i/8] |= (1 << (i%8));
To clear bit i:
packet.data[i/8] &= (1 << (i%8)) ^ 0xff;
To read bit i:
int flag = (packet.data[i/8] & (1 << (i%8)) != 0;

Structs in a 32-bit architecture [duplicate]

This question already has answers here:
What is the meaning of "__attribute__((packed, aligned(4))) "
(3 answers)
Closed 9 years ago.
The following code;
struct s1 {
void *a;
char b[2];
int c;
};
struct s2 {
void *a;
char b[2];
int c;
}__attribute__((packed));
if s1 has a size of 12 bytes and s2 has a size of 10 bytes, is this due to data being read in 4 byte chunks and }__attribute__((packed)); reduces the size of void*a; to only 2 bytes?
A little confused as to what }__attribute__((packed)); does.
Many thanks
It is due to alignment, a process in which the compiler adds hidden "junk" between the fields to make sure they have optimal (for performance) starting addresses.
Using packed forces the compiler to not do that, which often means that accessing the structure becomes slower (or simply impossible, causing e.g. a bus error) if the hardware has problems doing e.g. 32-bit accesses on addresses that are not multiples of 4.
On Intel processors, the fetches of 32-bit aligned data is considerably faster than unaligned; on many other processors unaligned fetches might be illegal altogether, or need to be simulated using 2 instructions. Thus the first structure would have the c always on these 32-bit architectures aligned to a byte address divisible by 4. This however requires that 2 bytes will be wasted in storage.
struct s1 {
void *a;
char b[2];
int c;
};
// Byte layout in memory (32-bit little-endian):
// | a0 | a1 | a2 | a3 | b0 | b1 | NA | NA | c0 | c1 | c2 | c3 |
// addresses increasing ====>
On the other hand, sometimes you absolutely need to map some unaligned datastructures (like file formats, or network packets), as is, into C structures; there you can use the __attribute__((packed)) to specify that you want everything without padding bytes:
struct s2 {
void *a;
char b[2];
int c;
} __attribute__((packed));
// Byte layout in memory (32-bit little-endian):
// | a0 | a1 | a2 | a3 | b0 | b1 | c0 | c1 | c2 | c3 |
// addresses increasing ====>
This is due to data structure alignment, a combination of two processes: data alignment and data padding. The first structure will be aligned to the word as you said, however the second structure is packed and forces the compiler to not pad the structure to the word.
The second structure is 10 bytes because the character array is 2 bytes, not the void pointer (it remains 4 bytes, as all pointers are). This can hinder performance as the trade off of 2 bytes of space is not worth the efficiency lost by the hardware (under most circumstances) and could lead to undefined behaviour.

How to write a byte to register with specific memory address?

I want to write a byte to register with specific memory address (0x1228A432)
But, this register has a following structure:
Bits | Access | Name | Reset | Description |
[31:8] | Read only | -------- | ------ | Reserved |
[7:0] | Read-write | REG[7:0] | 0xXX | ----------- |
Please tell me, how to write a byte to this register without "touching" the Reserved bits?
EDIT1: My target is Cortex A9.
I could successfully read/write to onboard DDR2 memory using 256-bit values (such as 0xFF)
EDIT2: I used to work with DDR2 memory in the following way :
// First stage
static unsigned char *p = 0;
char * argv1="0x60000000";
unsigned long address=strtoul(argv1, 0, 0);
p = (unsigned char *) argv1;
// Second stage
char * argv4="FF";
int value=strtol(argv4,0,16);
// Third stage
int offset = 9;
p[offset]=value;
EDIT3: I found out the following information:
All registers are 32 bits wide and do not support byte writes.
Write operations must be word-wide and bits marked as reserved must be preserved
using read-modify-write.
One way to preserve bits [31:8], assuming 32-bit wide access, is to read the value, zero-out bits [7:0], bitwise-or it with the value needed and then write it back to the register.
Something like (stealing from RedX a bit ;) ):
uint8_t your_8_bit_value = 0x42;
uint32_t volatile * const mem_map_register = (uint32_t volatile *) 0x1228a432;
*mem_map_register = (*mem_map_register & 0xFFFFFF00) | your_8_bit_value;
Yet I think there should be more info available about your hardware. I've seen several datasheets saying e.g. that you have to write all 1 to reserved bits (meaning that reserved bits are reserved for future use, and 1 is a safe default), etc. So it is not always obvious, that leaving reserved bits untouched is the right thing to do.
You should find more details about your hardware - are byte-wide writes supported, are writes to reserved bits ignored perhaps, or should be all 0/1, etc.
Look up the assembler instruction handbook for an 8 bit writing instruction (not sure if it exists). If it does, use an uint8_t for your assignment to that memory location (uint8_t volatile * const reg = (uint8_t volatile * const) 0x1228a432;).
Else do what Omkant said. Overwriting the bits with the same number should not produce any unwanted results, since they are not "zeroed" before being overwritten.
His code in C (this is the verbose version for better readability):
uint8_t your_8_bit_value = 0x42;
uint32_t volatile * const mem_map_register = (uint32_t volatile *) 0x1228a432;
*mem_map_register = (*mem_map_register & 0xFFFFFF00) | your_8_bit_value;
[register value] = ([register value] | [00 00 00 FF]) & [FF FF FF XX]
Here , xx is the one byte read from your given address and then set a mask of 24 bits.
And perform bitwise & on the values shown above
I think this should work

How are the structure members stored on a little endian machine?

struct Dummy {
int x;
char y;
};
int main() {
struct Dummy dum;
dum.x = 10;
dum.y = 'a';
}
How would be the layout of the structure members on a little endian machine?
Would it be something like this?
0x1000 +0 +1 +2 +3
___________________
x: | 10 | 0 | 0 | 0 |
-------------------
y: | 'a'| 0 | 0 | 0 |
-------------------
0x1000 +4 +5 +6 +7
I think you'll find this question useful. The endianess is usually relevant for a word in the memory, not to the whole structure.
Structure layout is a compiler implementation detail, affected by the default packing. Endianness normally only affects the order of the bytes in a structure member value, not the layout. Check the small print in the compiler manual or use sizeof and the offsetof macro to experiment.
The layout you documented in your question is indeed very common for a 32-bit LE compiler.
The structure members will be in the order declared, with padding inserted as necessary so each field is properly aligned for its type and with padding inserted as necessary at the end such that in an array each subsequent structure is properly aligned and begins immediately after the end of the previous structure. It is also possible (but unlikely) that additional unnecessary padding will be inserted between any two elements or at the end.
Each field itself will be stored as appropriate for the type on the compiler and architecture, e.g. the int 10 would be stored as the bytes 0a 00 00 00 on a normal little-endian machine with 32-bit ints.

Resources