Came across this question in one of the interview samples. A 16-byte aligned allocation has already been answered in How to allocate aligned memory only using the standard library?
But, I have a specific question in the same regarding the mask used to zero down the last 4 bits. This mask "~0F" has been used such that the resulting address is divisible by 16. What should be done to achieve the same for 32-byte alignment/divisibility?
First, the question you referred to is 16-byte alignment, not 16-bit alignment.
Regarding your actual question, you just want to mask off 5 bits instead of 4 to make the result 32-byte aligned. So it would be ~0x1F.
To clarify a bit:
To align a pointer to a 32 byte boundary, you want the last 5 bits of the address to be 0. (Since 100000 is 32 in binary, any multiple of 32 will end in 00000.)
0x1F is 11111 in binary. Since it's a pointer, it's actually some number of 0's followed by 11111 - for example, with 64-bit pointers, it would be 59 0's and 5 1's. The ~ means that these values are inverted - so ~0x1F is 59 1's followed by 5 0's.
When you take ptr & ~0x1F, the bitwise & causes all bits that are &'ed with 1 to stay the same, and all bits that are &'ed with 0 to be set to 0. So you end up with the original value of ptr, except that the last 5 bits have been set to 0. What this means is that we've subtracted some number between 0 and 31 in order to make ptr a multiple of 32, which was the goal.
Related
#include <stdio.h>
int main(void)
{
char c1 = '0';
char _Alignas(double) c2 = '0';
printf("char alignment: %zd\n", _Alignof(char));
printf("double alignment: %zd\n", _Alignof(double));
printf("&c1: %p\n", &c1);
printf("&c2: %p\n", &c2);
return 0;
}
Run in my environment, result is:
char alignment: 1
double alignment: 8
&c1: 000000000061FE1F
&c2: 000000000061FE18
Compiled by gcc version 8.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project).
I wonder why the number of bytes between &c2 and &c1 is 7, not 8 (sizeof(double) in my environment)?
why char _Alignas double size is not 8?
Alignment means the address is a multiple of some value. It has nothing to do with the size of a variable. _Alignas(8) means the address will be a multiple of 8, i.e. ending with 0 or 8 in hexadecimal
I wonder why the bytes between &c2 and &c1 is 7, not 8 (sizeof double in my environment)?
The positions of variables on stack isn't specified. The compiler can freely put c1 before or after c2. The only requirement here is the alignment of c2 which must be a multiple of 8. If c1 is at a multiple of 8 and the compiler chooses to put c2 right after that then 7 bytes of padding will be used. But they can obviously put c1 right before c2 and their addresses will differ by only 1. You can easily see each compiler puts the variables in different positions with different distances
For more details read
Order of local variable allocation on the stack
What is the order of local variables on the stack?
why "cout" prints variables in wrong order
How the local variable stored in stack
Alignment determines where the addresses will be, not how much space they will take.
A char only takes up 1 byte, even if it is aligned on an 8 byte boundary. There are 6 unused bytes between c1 and c2.
Often, the alignment and the size will be the same, or the alignment is rounded up to a power of 2. That's because on many architectures it may require multiple FETCH instructions to retrieve from a non-aligned memory address.
In your case, c1 is aligned on a 1-byte boundary, so it is placed at the first available position on the stack.
c2 is aligned on an 8-byte boundary, so it skips over however many bytes are needed until the address is aligned on an 8 byte boundary. This means (address % 8) == 0. If c2 were of type double, it would need to move to the next aligned address, 8 more bytes ahead, but since it is only a char it fits just fine at the first 8-byte boundary. (EDIT: as others have pointed out, the compiler could rearrange the variables, and probably would in that case)
The alignment can be greater or less than the size of the element.
On some architecture, you simply HAD to align on 2 or 4 byte boundaries. The computers simply could not address the "in between bytes". The address lines aren't needed, so you can address 2^16 bytes of memory using only 14 address lines if you align on 4 byte addresses (and always read 4 bytes). That can reduce the size needed for the memory bus.
Below is an excerpt from the red dragon book.
Example 7.3. Figure 7.9 is a simplification of the data layout used by C compilers for two machines that we call Machine 1 and Machine 2.
Machine 1 : The memory of Machine 1 is organized into bytes consisting of 8 bits each. Even though every byte has an address, the instruction set favors short integers being positioned at bytes whose addresses are even, and integers being positioned at addresses that are divisible by 4. The compiler places short integers at even addresses, even if it has to skip a byte as padding in the process. Thus, four bytes, consisting of 32 bits, may be allocated for a character followed by a short integer.
Machine 2: each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word. The strong word orientation of the instruction set of Machine 2 has led the compiler to allocate a complete word at a time, even when fewer bits would suffice to represent all possible values of that type; e.g., only 8 bits are needed to represent a character. Hence, under alignment, Fig. 7.9 shows 64 bits for each type. Within each word, the bits for each basic type are in specified positions. Two words consisting of 128 bits would be allocated for a character followed by a short integer, with the character using only 8 of the bits in the first word and the short integer using only 24 of the bits in the second word. □
I found about the concept of alignment here ,here and here. What I could understand from them is as follows: In word addressable CPUs (where size is more than a byte), there certain paddings are introduced in the data objects, such that CPU can efficiently retrieve data from the memory with minimum no. of memory cycles.
Now the Machine 1 here is actually a byte address one. And the conditions in the Machine 1 specification are probably more difficult than a simple word addressable machine having word size of say 4 bytes. In such a 64 bit machine, we need to make sure that our data items are just word aligned ,no more difficulty. But how to find the alignment in systems like Machine 1 (as given in the table above) where the simple concept of word alignment does not work, because it is byte addressable and has much more difficult specifications.
Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).
"each word consists of 64 bits, and 24 bits are allowed for the address of a word. There are 64 possibilities for the individual bits inside a word, so 6 additional bits are needed to distinguish between them. By design, a pointer to a character on Machine 2 takes 30 bits — 24 to find the word and 6 for the position of the character inside the word." - Moreover how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)
First of all, the machine 1 is not special at all - it is exactly like a x86-32 or 32-bit ARM.
Moreover I find it quite weird that in the row for double the size of the type is more than what is given in the alignment field. Shouldn't alignment(in bits) ≥ size (in bits) ? Because alignment refers to the memory actually allocated for the data object (?).
No, this isn't true. Alignment means that the address of the lowest addressable byte in the object must be divisible by the given number of bytes.
Additionally, with C, it is also true that within arrays sizeof (ElementType) will need to be greater than or equal to the alignment of each member and sizeof (ElementType) be divisible by alignment, thus the footnote a. Therefore on the latter computer:
struct { char a, b; }
might have sizeof 16 because the characters are in distinct addressable words, whereas
struct { char a[2]; }
could be squeezed into 8 bytes.
how should this statement about the concept of the pointers, based on alignment is to be visualized (2^6 = 64, it is fine but how is this 6 bits correlating with the alignment concept)
As for the character pointers, the 6 bits is bogus. 3 bits are needed to choose one of the 8 bytes within the 8-byte words, so this is an error in the book. An ordinary byte would select just a word with 24 bits, and a character (a byte) pointer would select the word with 24 bits, and one of the 8-bit bytes inside the word with 3 bits.
I'm reading the c code:
void **alignedData = (void **)(((size_t)temp + aligned - 1)&-aligned);
I do not known the means, especially the &- part.
Can anyone explain it?
Thanks!
When using this, aligned should be an unsigned type (or the C implementation should be using two’s complement) and have a value that is a power of two. Then this code calculates an amount of memory to be allocated:
(size_t) temp converts temp to the unsigned type size_t, which is suitable for working with sizes. This will be a number of bytes to be allocated.
(size_t) temp + aligned - 1 adds enough bytes to guarantee a multiple of aligned falls somewhere between the numbers temp and temp + aligned - 1, inclusive. For example, if temp is 37 and aligned is 8, then between 37 and 44 (37+8−1), there is a multiple of 8 (40).
-aligned makes a bit mask with 1 in each bit position that is a multiple of aligned and 0 in the lower bits. For example, if aligned is 8, then the bits that represent -aligned are 111…111000, because the 000 bits at the end represent values of 1, 2, and 4, while the other bits represent values of 8, 16, 32, and so on.
The & (bitwise AND) of (size_t) temp + aligned - 1 with -aligned then clears the low bits, leaving only bits that are multiples of aligned. Thus, it produces the multiple of aligned that is in the interval. For example, with the values of 37 and 8 mentioned before, ((size_t) temp + aligned - 1) & -aligned produces 40.
Thus, this expression produces the value of temp rounded up to the next multiple of aligned. It says “Calculate the number of bytes we need to allocate that is at least temp bytes and is a multiple of aligned.”
After this, the code converts this number to the type void ** and uses it to initialize void **alignedData. That is bad C code. There is generally no good reason for it. A number of bytes like this should not be used as any kind of pointer. The code may be attempting to “smuggle” this value through a data type it is compelled to use by some other software, but there is likely a better way to do it, such as by allocating memory to hold the value and supplying a pointer to that memory instead of trying to convert the value directly. Finding a better solution requires knowing more context of the code.
I stumbled upon this answer regarding the utilization of the magic number 0x07EFEFEFF used for strlen's optimization, and here is what the top answer says:
Look at the magic bits. Bits number 16, 24 and 31 are 1. 8th bit is 0.
8th bit represents the first byte. If the first byte is not zero, 8th bit becomes 1 at this point. Otherwise it's 0.
16th bit represents the second byte. Same logic.
24th bit represents the third byte.
31th bit represents the fourth byte.
However, if I calculate result = ((a + magic) ^ ~a) & ~magic with a = 0x100, I find that result = 0x81010100, meaning that according to the top answerer, the second byte of a equals 0, which is obviously false.
What am I missing?
Thanks!
The bits only tell you if a byte is zero if the lower bytes are non-zero -- so it can only tell you the FIRST 0 byte, but not about bytes after the first 0.
bit8=1 means first byte is zero. Other bytes, unknown
bit8=0 means first byte is non-zero
bit8=0 & bit16=1 means second byte is zero, higher bytes unknown
bit8=0 & bit16=0 mans first two bytes are non-zero.
Also, the last bit (bit31) only tells you about 7 bits of the last byte (and only if the first 3 bytes are non-zero) -- if it is the only bit set then the last byte is 0 or 128 (and the rest are non-zero).
I was wondering what the following code is doing exactly? I know it's something to do with memory alignment but when I ask for the sizeof(vehicle) it prints 20 but the struct's actual size is 22. I just need to understand how this works, thanks!
struct vehicle {
short wheels:8;
short fuelTank : 6;
short weight;
char license[16];
};
printf("\n%d", sizeof(struct vehicle));
20
Memory will be allocated as (assuming memory word size is of 8 bits)
struct vehicle {
short wheels:8; // 1 byte
short fuelTank : 6;
// padd 2 bits to make fuelTank of 1 byte.
short weight; // 2 bytes.
char license[16]; // 16 bytes.
};
1 + 1 + 2 + 16 = 20 bytes.
Consider a machine with a word size of 32bit. The two first fields fit in a whole 16bit word as they occupy 8 + 6 = 14 bits. The second field, while not a bitfield (doesn't have the :<number> thing to allocate space in bits) can fit another 16 bits word to complete a 32 bit word, so the three first fields can pack in a 32bit word (4 bytes) if the architecture allows to access the memory in 16 bit quantities. Finaly, if you add 16 characters to that, this gives the 20 bytes that sizeof operator sends to printf.
Why do you assume the sizeof (struct vehicle) is 22 bytes? You allowed the compiler to print it and it said it's 20. Compilers are free to pad (or not) the structures to achieve better performance. That's an architecture dependency, and as you have not said architecture and compiler used, it is not possible to go further.
For example, 32bit intel arch allows to pad words at even boundaries without performance penalties, so this is a good selection in order to save memory. On other architectures, perhaps it's not allowed to use 16bit integers and data must be padded to fit the third field (leading to 22 bytes for the whole structure)
The only warranty you have when sizing data is that the compiler must allocate enough space to fit everything in an efficient way, so the only thing you can assume from that declaration is that it will occupy at least the minimum space to represent one field of 8 bit, other of 6, a complete short (I'll assume a short is 16 bit) and 16 characters (assuming 8 bits per char) it ammounts to 8 + 6 + 16 + 16*8 = 158 bits minimum.
Suppose we are writing a compiler for D. Knuth MIX machine. As it's stated in his book Fundamental Algorithms, this machine has an unspecified byte size of 64..100 bytes, requiring five to construct one addressable word (plus a binary sign). If you had a byte size independent compiler (one that compiles for any MIX machine, without assumptions of byte size) you have to use no more than 64 possible values per byte, leading to 6 bit per byte. You then would assume the second field fills one complete byte (and the sign drawn from the word it belongs to) and the first field needs two complete bytes (using half of the values for negative values) The third field might be in the second word, filling three complete bytes (6*3 = 18) and the sign of that word. The next 16 chars can begin on the next word, summing up to five complete words, so the whole structure will have 1 + 1 + 4 = 6 words, or 30 bytes. But if you want to handle effectively three signed fields, you'll need three complete words for the three fields (as each has a sign field only) leading to 7 words or 35 bytes.
I have suggested this example because of the particular characteristics of this architecture, that makes one to think on not so uncommon architectures that some time ago where in common use (the first machines ever built where not binary based, like some of these MIX machines)
Note
You can try to print the actual offsets of the fields, to see where in the structure are located and see where the compiler is padding.
#define OFFSET(Typ, field) ((int)&((Typ *)0)->field)
(Note, edited)
This macro will tell you the offset as an int. Use it as OFFSET(struct vehicle, weight) or OFFSET(struct vehicle, license[3])
Note
I had to edit the last macro definition as it complains on some architectures as the conversion of pointer -> int is not always possible (on 64bit architectures, it looses some bits) so it's better to compute the difference of two pointers, which is a proper size_t value, than to convert it directly from pointer.
#define OFFSET(Typ, field) ((char *)&((Typ *)0)->field - (char *)0)