While browsing through some kernel code, I found a formula for memory alignment as
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
So then I even write a program for this:
#include <stdio.h>
int main(int argc, char** argv) {
long long operand;
long long alignment;
if(argv[1]) {
operand = atol(argv[1]);
} else {
printf("Enter value to be aligned!\n");
return -1;
}
if(argv[2]) {
alignment = strtol(argv[2],NULL,16);
} else {
printf("\nDefaulting to 1MB alignment\n");
alignment = 0x100000;
}
long long aligned = ((operand + (alignment - 1)) & ~(alignment - 1));
printf("Aligned memory is: 0x%.8llx [Hex] <--> %lld\n",aligned,aligned);
return 0;
}
But I don't get this logic at all. How does this work?
Basically, the formula increase an integer operand (address) to a next address aligned to the alignment.
The expression
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
is basically the same as a bit easier to understand formula:
aligned = int((operand + (alignment - 1)) / alignment) * alignment
For example, having operand (address) 102 and alignment 10 we get:
aligned = int((102 + 9) / 10) * 10
aligned = int(111 / 10) * 10
aligned = 11 * 10
aligned = 110
First we add to the address 9 and get 111. Then, since our alignment is 10, basically we zero out the last digit, i.e. 111 / 10 * 10 = 110
Please note, that for each power of 10 alignment (i.e. 10, 100, 1000 etc) we basically zeroing out last digits.
On most CPUs, division and multiplication operations take much more time than bitwise operations, so let us get back to the original formula:
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
The second part of the formula makes sense only when alignment is a power of 2. For example:
... & ~(2 - 1) will zero last bit of address.
... & ~(64 - 1) will zero last 5 bits of address.
etc
Just like with zeroing out last few digits of an address for power of 10 alignments, we zeroing out last few bits for power of 2 alignments.
Hope it does make sense for you now.
Related
I'm trying to understand how to check if a pointer is aligned or not and eventually align it.
To understand it I take this function:
#define PJ_POOL_ALIGNMENT 8
PJ_DEF(pj_pool_t*) pj_pool_create_on_buf(const char *name,
void *buf,
pj_size_t size)
{
#if PJ_HAS_POOL_ALT_API == 0
struct creation_param param;
pj_size_t align_diff;
PJ_ASSERT_RETURN(buf && size, NULL);
if (!is_initialized) {
if (pool_buf_initialize() != PJ_SUCCESS)
return NULL;
is_initialized = 1;
}
/* Check and align buffer */
align_diff = (pj_size_t)buf;
if (align_diff & (PJ_POOL_ALIGNMENT-1)) {
align_diff &= (PJ_POOL_ALIGNMENT-1);
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
}
param.stack_buf = buf;
param.size = size;
pj_thread_local_set(tls, ¶m);
return pj_pool_create_int(&stack_based_factory, name, size, 0,
pj_pool_factory_default_policy.callback);
#else
PJ_UNUSED_ARG(buf);
return pj_pool_create(NULL, name, size, size, NULL);
#endif
}
obviously the part that interests me is / * Check and align buffer * /
the only thing I think I understand is this:
let's focus on the if.
This wants to verify if the buffer is aligned to an 8 byte multiple address. If the condition of the if is not aligned, a number other than 0 returns, and then the alignment is carried out, otherwise, it is sufficient that there is also only a bit with a 1 to make the IF be skipped. To obtain this result they make the variable PJ_POOL_ALIGNMENT a 7 (0111) and with this they make an AND with the address of where the buffer was allocated. The operation is as follows considering that I want to get a number other than 0 if the address is not aligned.
0000.. . 0111 AND
xxxx. . . x100
0000.. . 0100 not aligned
if there is a 1 (or more 1) in any of the last 3 bits and therefore I know it is not aligned with an 8byte block: x AND 1 = 0, the if will be true. then it will enter the correction block.
But the if block is obscure to me.
Someone who can confirm if my reasoning is correct and make me understand the block.
The current alignment code is incorrect. It determines the alignment difference from the lower alignment boundary and is incorrectly adding that to the pointer value to reach the upper alignment boundary:
xxxxx000 + 000 = xxxxx000 (OK - no change)
xxxxx001 + 001 = xxxxx010 (WRONG)
xxxxx010 + 010 = xxxxx100 (WRONG)
xxxxx011 + 011 = xxxxx110 (WRONG)
xxxxx100 + 100 = xxxxy000 (OK - rounded up)
xxxxx101 + 101 = xxxxy010 (WRONG)
xxxxx110 + 110 = xxxxy100 (WRONG)
xxxxx111 + 111 = xxxxy110 (WRONG)
The difference to the upper alignment boundary is the 2's complement of the difference to the lower alignment boundary, modulo the alignment size:
xxxxx000 + 000 = xxxxx000 (OK - no change)
xxxxx001 + 111 = xxxxy000 (OK - rounded up)
xxxxx010 + 110 = xxxxy000 (OK - rounded up)
xxxxx011 + 101 = xxxxy000 (OK - rounded up)
xxxxx100 + 100 = xxxxy000 (OK - rounded up)
xxxxx101 + 011 = xxxxy000 (OK - rounded up)
xxxxx110 + 010 = xxxxy000 (OK - rounded up)
xxxxx111 + 001 = xxxxy000 (OK - rounded up)
The current alignment code can be corrected with the addition of a single line to convert the lower alignment difference to an upper alignment difference:
/* Check and align buffer */
align_diff = (pj_size_t)buf;
if (align_diff & (PJ_POOL_ALIGNMENT-1)) {
align_diff &= (PJ_POOL_ALIGNMENT-1);
align_diff = PJ_POOL_ALIGNMENT - align_diff; // upper alignment
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
}
Alternatively, the upper alignment difference could be determined directly before the if:
/* Check and align buffer */
align_diff = (pj_size_t)-(pj_size_t)buf & (PJ_POOL_ALIGNMENT-1);
if (align_diff != 0) {
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
}
It could be argued (and has been!) that this is less readable than the original version.
In fact, the if could be omitted, since adding zero makes no difference:
/* Check and align buffer */
align_diff = (pj_size_t)-(pj_size_t)buf & (PJ_POOL_ALIGNMENT-1);
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
Regarding align_diff = (pj_size_t)-(pj_size_t)buf & (PJ_POOL_ALIGNMENT-1);, the (pj_size_t)buf converts the pointer to an unsigned integer type, the - negates the value, and the initial (pj_size_t) converts the negated value to an unsigned integer type using 2's complement arithmetic. The & (PJ_POOL_ALIGNMENT-1) converts modulo PJ_POOL_ALIGNMENT equivalently to % PJ_POOL_ALIGNMENT (since PJ_POOL_ALIGNMENT is a power of 2).
Strictly, to avoid undefined behavior, the above pointer to integer conversion should be done using uintptr_t (defined by #include <stdint.h>) instead of pj_size_t:
align_diff = (uintptr_t)-(uintptr_t)buf & (PJ_POOL_ALIGNMENT-1);
Regarding buf = (void*) (((char*)buf) + align_diff);, pointer arithmetic is not allowed on void * values (at least in standard C), so (char*)buf converts it to a pointer to char. Since sizeof(char) is 1 byte by definition, the + align_diff advances the pointer by align_diff bytes as required. The (void*) then converts this back to a void * before assigning that back to buf. This (void*) can be omitted in C (but not in C++), so the statement could be rewritten as:
buf = (char*)buf + align_diff;
which is arguably more readable.
I am aware of the memory alignment in structure but I am stumped at this implementation I came across in a project I am working on.
struct default {
uint8_t variable[((sizeof(struct dummyStructure) + 3) /4)*4] // Align on 32 bit boundary
}
It is more like a black box testing for me at the moment because I dont have access to the functions, but can anyone explain the math used here to cause this alignment to happen.
The answer to your question is that, by adding 3 to the size of dummyStructure and by taking the integer part of the result of the division by 4 to multiply by 4, you will either have:
The exact size of dummyStructure, if it is aligned to 32 bits (or any multiple of it, such as 64 bits).
Or the first multiple of 32 bits greater than the size of dummyStructure.
Therefore it will always yield a 4 bytes divisible number (32 bits alignment).
Example:
If the size of dummyStructure is 8 bytes, the result would be ((8 + 3)/4)*4 = 8.
Now if the size of dummyStructure is, lets say 11, the result would be ((11 + 3)/4)*4 = 12.
I'm just wondering why the developer decided for this, though, since dummyStructure should be always aligned according to the processor architecture.
You can decompose it:
uint8_t variable[((sizeof(struct dummyStructure) + 3) /4)*4]
You have two cases - sizeof dummyStructure is evenly dividable by 4 or not.
Example:
(sizeof(struct dummyStructure) = 12
(12 + 3) / 4 = 15 / 4 = 3
3 * 4 = 12
so return original size
(sizeof(struct dummyStructure) = 13
(13 + 3) / 4 = 16 / 4 = 4
4 * 4 = 16
so return next size evenly dividable by 4
(sizeof(struct dummyStructure) = 15
(15 + 3) / 4 = 18 / 4 = 4
4 * 4 = 16
as above
(sizeof(struct dummyStructure) = 16
(16 + 3) / 4 = 19 / 4 = 4
4 * 4 = 16
so return original size again
(sizeof(struct dummyStructure) = 17
(17 + 3) / 4 = 20 / 4 = 5
5 * 4 = 20
so return next size evenly dividable by 4
In reality this code does not align variable at 32bit address! It only allocates enough space in array to allow put there dummyStructure alligned manually.
This solution is really bad.
IMHO better solutions (of course depends what happens in code):
1) since C11
struct defaultx
{
alignas(4) int variable[sizeof(struct dummyStructure)];
};
2) gcc or clang specific
struct defaultx
{
int variable[sizeof(struct dummyStructure)];
} __attribute__((aligned(4)));
will make sure variable is aligned to 4 bytes;
I'm trying to figure out what this code does. I'm analyzing my professor's malloc code and he has this function in his code. I don't get why he does though. To me it just returns the same amount of allocated space.
static inline size_t word_align(size_t size) {
return size + (sizeof(size_t) - 1) & ~(sizeof(size_t) - 1);
}
You could have seen this for yourself: a simple example shows the word alignment.
#include <stdio.h>
size_t word_align(size_t size)
{
return size + (sizeof(size_t) - 1) & ~(sizeof(size_t) - 1);
}
int main(void)
{
size_t i;
for (i=1; i<10; i++)
printf("%zu %zu\n", i, word_align(i));
return 0;
}
Program output:
1 4
2 4
3 4
4 4
5 8
6 8
7 8
8 8
9 12
The code is doing 8-byte alignment for the requested memory. This is a common practice in systems programming, and a classic technique.
Why it is done? From Wikipedia:
Data alignment means putting the data at a memory address equal to
some multiple of the word size, which increases the system's
performance due to the way the CPU handles memory.
To understand the code better parenthesizing to make the operator precedence of '+' and '&' explicit.
(size + (sizeof(size_t) - 1)) & ~(sizeof(size_t) - 1)
Given that sizeof(size_t) = 8, and size = 170, what the code does is:
(170 + 8) & ~(0x7)
Thus, the ~(sizeof(size_t) - 1) acts as 3-bit mask.
Could anyone explain this code?
page_idx = page_to_pfn(page) & ((1 << MAX_ORDER) - 1);
page_to_pfn() have already return the page_idx, so what does '&' use for? Or page_to_pfn() return something else?
You need to know that x & ((1 << n) - 1) is a trick meaning x % ((int) pow(2, n)). Often it's faster (but it's better to leave these kind of optimizations to the compiler).
So in this case what this does it does a modulo by pow(2, MAX_ORDER). This causes a wrap-around; if page_idx is larger than pow(2, MAX_ORDER) it will go back to 0. Here is equivalent, but more readable code:
const int MAX_ORDER_N = (int) pow(2, MAX_ORDER);
page_idx = page_to_pfn(page);
/* wraparound */
while (page_idx > MAX_ORDER_N) {
page_idx -= MAX_ORDER_N;
}
It's a bit mask that ensures that page_idx does not exceed a certain value (2^MAX_ORDER).
# define MAX_ORDER (8)
(1 << MAX_ORDER) /* 100000000 */
- 1 /* flip bits, same as ~(…) due to two-complement: 11111111 */
So you only have the eight least significant bits left
1010010101001
& 0000011111111
= 0000010101001
chekck this function will be clear:
static inline struct page *
__page_find_buddy(struct page *page, unsigned long page_idx, unsigned int order)
{
unsigned long buddy_idx = page_idx ^ (1 << order);
return page + (buddy_idx - page_idx);
}
it just limits page_idx into a range of 8MB, maybe because the maximum block size is 4MB (1024 pages), it can not be merged again, only 2MB blocks can merge into 4MB, and the buddy block can be before or after the page, so
the whole range is [page_idx - 2MB, page_idx + 2MB] ??
its absolute size is not important, but offset (buddy_idx - page_idx) is important, add page to get the real buddy address.
I have a binary string that I am encoding in Base 64. Now, I need to know before hand the size of the final Base 64 encoded string will be.
Is there any way to calculate that?
Something like:
BinaryStringSize is 64Kb
EncodedBinaryStringSize will be 127Kb after encoding.
Oh, the code is in C.
Thanks.
If you do Base64 exactly right, and that includes padding the end with = characters, and you break it up with a CR LF every 72 characters, the answer can be found with:
code_size = ((input_size * 4) / 3);
padding_size = (input_size % 3) ? (3 - (input_size % 3)) : 0;
crlfs_size = 2 + (2 * (code_size + padding_size) / 72);
total_size = code_size + padding_size + crlfs_size;
In C, you may also terminate with a \0-byte, so there'll be an extra byte there, and you may want to length-check at the end of every code as you write them, so if you're just looking for what you pass to malloc(), you might actually prefer a version that wastes a few bytes, in order to make the coding simpler:
output_size = ((input_size * 4) / 3) + (input_size / 96) + 6;
geocar's answer was close, but could sometimes be off slightly.
There are 4 bytes output for every 3 bytes of input. If the input size is not a multiple of three, we must add to make it one. Otherwise leave it alone.
input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0)
Divide this by 3, then multiply by 4. That is our total output size, including padding.
code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4
As I said in my comment, the total size must be divided by the line width before doubling to properly account for the last line. Otherwise the number of CRLF characters will be overestimated. I am also assuming there will only be a CRLF pair if the line is 72 characters. This includes the last line, but not if it is under 72 characters.
newline_size = ((code_padded_size) / 72) * 2
So put it all together:
unsigned int code_padded_size = ((input_size + ( (input_size % 3) ? (3 - (input_size % 3)) : 0) ) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;
unsigned int total_size = code_padded_size + newline_size;
Or to make it a bit more readable:
unsigned int adjustment = ( (input_size % 3) ? (3 - (input_size % 3)) : 0);
unsigned int code_padded_size = ( (input_size + adjustment) / 3) * 4;
unsigned int newline_size = ((code_padded_size) / 72) * 2;
unsigned int total_size = code_padded_size + newline_size;
Here is a simple C implementation (without modulus and trinary operators) for raw base64 encoded size (with standard '=' padding):
int output_size;
output_size = ((input_size - 1) / 3) * 4 + 4;
To that you will need to add any additional overhead for CRLF if required. The standard base64 encoding (RFC 3548 or RFC 4648) allows CRLF line breaks (at either 64 or 76 characters) but does not require it. The MIME variant (RFC 2045) requires line breaks after every 76 characters.
For example, the total encoded length using 76 character lines building on the above:
int final_size;
final_size = output_size + (output_size / 76) * 2;
See the base64 wikipedia entry for more variants.
Check out the b64 library. The function b64_encode2() can give a maximum estimate of the required size if you pass NULL, so you can allocate memory with certainty, and then call again passing the buffer and have it do the conversion.
I ran into a similar situation in python, and using codecs.iterencode(text, "base64") the correct calculation was:
adjustment = 3 - (input_size % 3) if (input_size % 3) else 0
code_padded_size = ( (input_size + adjustment) / 3) * 4
newline_size = ((code_padded_size) / 76) * 1
return code_padded_size + newline_size
Base 64 transforms 3 bytes into 4.
If you're set of bits does not happen to be a multiple of 24 bits, you must pad it out so that it has a multiple of 24 bits (3 bytes).
I think this formula should work:
b64len = (size * 8 + 5) / 6
if (inputSize == 0) return 0;
int size = ((inputSize - 1) / 3) * 4 + 4;
int nlines = (size - 1)/ maxLine + 1;
return size + nlines * 2;
This formula adds a terminating CRLF (MIME, rfc2045) if and only if the last line does not fit exactly in max line length.
The actual length of MIME-compliant base64-encoded binary data is usually about 137% of the original data length, though for very short messages the overhead can be a lot higher because of the overhead of the headers. Very roughly, the final size of base64-encoded binary data is equal to 1.37 times the original data size + 814 bytes (for headers).
In other words, you can approximate the size of the decoded data with this formula:
BytesNeededForEncoding = (string_length(base_string) * 1.37) + 814;
BytesNeededForDecoding = (string_length(encoded_string) - 814) / 1.37;
Source: http://en.wikipedia.org/wiki/Base64