Check and align buffer - c

I'm trying to understand how to check if a pointer is aligned or not and eventually align it.
To understand it I take this function:
#define PJ_POOL_ALIGNMENT 8
PJ_DEF(pj_pool_t*) pj_pool_create_on_buf(const char *name,
void *buf,
pj_size_t size)
{
#if PJ_HAS_POOL_ALT_API == 0
struct creation_param param;
pj_size_t align_diff;
PJ_ASSERT_RETURN(buf && size, NULL);
if (!is_initialized) {
if (pool_buf_initialize() != PJ_SUCCESS)
return NULL;
is_initialized = 1;
}
/* Check and align buffer */
align_diff = (pj_size_t)buf;
if (align_diff & (PJ_POOL_ALIGNMENT-1)) {
align_diff &= (PJ_POOL_ALIGNMENT-1);
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
}
param.stack_buf = buf;
param.size = size;
pj_thread_local_set(tls, &param);
return pj_pool_create_int(&stack_based_factory, name, size, 0,
pj_pool_factory_default_policy.callback);
#else
PJ_UNUSED_ARG(buf);
return pj_pool_create(NULL, name, size, size, NULL);
#endif
}
obviously the part that interests me is / * Check and align buffer * /
the only thing I think I understand is this:
let's focus on the if.
This wants to verify if the buffer is aligned to an 8 byte multiple address. If the condition of the if is not aligned, a number other than 0 returns, and then the alignment is carried out, otherwise, it is sufficient that there is also only a bit with a 1 to make the IF be skipped. To obtain this result they make the variable PJ_POOL_ALIGNMENT a 7 (0111) and with this they make an AND with the address of where the buffer was allocated. The operation is as follows considering that I want to get a number other than 0 if the address is not aligned.
0000.. . 0111 AND
xxxx. . . x100
0000.. . 0100 not aligned
if there is a 1 (or more 1) in any of the last 3 bits and therefore I know it is not aligned with an 8byte block: x AND 1 = 0, the if will be true. then it will enter the correction block.
But the if block is obscure to me.
Someone who can confirm if my reasoning is correct and make me understand the block.

The current alignment code is incorrect. It determines the alignment difference from the lower alignment boundary and is incorrectly adding that to the pointer value to reach the upper alignment boundary:
xxxxx000 + 000 = xxxxx000 (OK - no change)
xxxxx001 + 001 = xxxxx010 (WRONG)
xxxxx010 + 010 = xxxxx100 (WRONG)
xxxxx011 + 011 = xxxxx110 (WRONG)
xxxxx100 + 100 = xxxxy000 (OK - rounded up)
xxxxx101 + 101 = xxxxy010 (WRONG)
xxxxx110 + 110 = xxxxy100 (WRONG)
xxxxx111 + 111 = xxxxy110 (WRONG)
The difference to the upper alignment boundary is the 2's complement of the difference to the lower alignment boundary, modulo the alignment size:
xxxxx000 + 000 = xxxxx000 (OK - no change)
xxxxx001 + 111 = xxxxy000 (OK - rounded up)
xxxxx010 + 110 = xxxxy000 (OK - rounded up)
xxxxx011 + 101 = xxxxy000 (OK - rounded up)
xxxxx100 + 100 = xxxxy000 (OK - rounded up)
xxxxx101 + 011 = xxxxy000 (OK - rounded up)
xxxxx110 + 010 = xxxxy000 (OK - rounded up)
xxxxx111 + 001 = xxxxy000 (OK - rounded up)
The current alignment code can be corrected with the addition of a single line to convert the lower alignment difference to an upper alignment difference:
/* Check and align buffer */
align_diff = (pj_size_t)buf;
if (align_diff & (PJ_POOL_ALIGNMENT-1)) {
align_diff &= (PJ_POOL_ALIGNMENT-1);
align_diff = PJ_POOL_ALIGNMENT - align_diff; // upper alignment
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
}
Alternatively, the upper alignment difference could be determined directly before the if:
/* Check and align buffer */
align_diff = (pj_size_t)-(pj_size_t)buf & (PJ_POOL_ALIGNMENT-1);
if (align_diff != 0) {
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
}
It could be argued (and has been!) that this is less readable than the original version.
In fact, the if could be omitted, since adding zero makes no difference:
/* Check and align buffer */
align_diff = (pj_size_t)-(pj_size_t)buf & (PJ_POOL_ALIGNMENT-1);
buf = (void*) (((char*)buf) + align_diff);
size -= align_diff;
Regarding align_diff = (pj_size_t)-(pj_size_t)buf & (PJ_POOL_ALIGNMENT-1);, the (pj_size_t)buf converts the pointer to an unsigned integer type, the - negates the value, and the initial (pj_size_t) converts the negated value to an unsigned integer type using 2's complement arithmetic. The & (PJ_POOL_ALIGNMENT-1) converts modulo PJ_POOL_ALIGNMENT equivalently to % PJ_POOL_ALIGNMENT (since PJ_POOL_ALIGNMENT is a power of 2).
Strictly, to avoid undefined behavior, the above pointer to integer conversion should be done using uintptr_t (defined by #include <stdint.h>) instead of pj_size_t:
align_diff = (uintptr_t)-(uintptr_t)buf & (PJ_POOL_ALIGNMENT-1);
Regarding buf = (void*) (((char*)buf) + align_diff);, pointer arithmetic is not allowed on void * values (at least in standard C), so (char*)buf converts it to a pointer to char. Since sizeof(char) is 1 byte by definition, the + align_diff advances the pointer by align_diff bytes as required. The (void*) then converts this back to a void * before assigning that back to buf. This (void*) can be omitted in C (but not in C++), so the statement could be rewritten as:
buf = (char*)buf + align_diff;
which is arguably more readable.

Related

What adjust_size in musl libc malloc does?

I was studying musl libc malloc implementation and I am having hard time understanding the adjust_size function.
static int adjust_size(size_t *n)
{
/* Result of pointer difference must fit in ptrdiff_t. */
if (*n-1 > PTRDIFF_MAX - SIZE_ALIGN - PAGE_SIZE) {
if (*n) {
errno = ENOMEM;
return -1;
} else {
*n = SIZE_ALIGN;
return 0;
}
}
*n = (*n + OVERHEAD + SIZE_ALIGN - 1) & SIZE_MASK;
return 0;
}
For example in the first comparison, why they are not just comparing against PTRDIFF_MAX. It is what seems to be the intent from the comment above anyway, and why are they subtracting 1 from *n, I think that (*n-1) was being compared as unsigned instead of signed, so they are handling the case where *n is 0. But I do not know why this is being compared as unsigned in that case as it seems both positions would evaluate to signed numbers at the end.
Also why does they set *n to SIZE_ALIGN if it is 0? My understanding is that malloc should return NULL or a pointer where it can be passed to free without causing an issue if size is 0.
why they are not just comparing against PTRDIFF_MAX
Most malloc implementation allocate large chunks separately using mmap. Because mmap allocates memory in pages, n needs to be aligned to a page boundary (PAGE_SIZE), plus should include chunk header (which is aligned by SIZE_ALIGN).
This is why comparison is performed against PTRDIFF_MAX - SIZE_ALIGN - PAGE_SIZE instead of PTRDIFF_MAX - to make sure all possible future alignment adjustments won't cause chunk size to be greater than PTRDIFF_MAX.
why are they subtracting 1 from *n
Because n might be aligned later like this:
n = (n + SIZE_ALIGN + PAGE_SIZE - 1) & -PAGE_SIZE;
And resulting value should be less or equal to PTRDIFF_MAX. Value PTRDIFF_MAX - SIZE_ALIGN - PAGE_SIZE + 1 is still okay, so 1 is subtracted.
Also why does they set *n to SIZE_ALIGN if it is 0
Because adjusted chunk size should be greater or equal to SIZE_ALIGN bytes to fit OVERHEAD bytes of heap overhead plus requested data area should be able to fit 2 pointers used later from free. This alignment is assumed later in code.
I think that (*n-1) was being compared as unsigned instead of
signed, so they are handling the case where *n is 0. But I do not
know why this is being compared as unsigned in that case as it seems
both positions would evaluate to signed numbers at the end.
I think it could be written simpler (althought this might be incorrect, I am probably need to have a sleep):
static int adjust_size(size_t *n)
{
if (*n > PTRDIFF_MAX - SIZE_ALIGN - PAGE_SIZE + 1) {
errno = ENOMEM;
return -1;
}
*n = (*n + OVERHEAD + SIZE_ALIGN - 1) & SIZE_MASK;
return 0;
}

Circular buffer increment using alternate method

I am not able to understand how does the last statement increments the pointer.Can somebody explain me with few examples?
The code, as shown:
aptr = (aptr + 1) & (void *)(BUFFERSIZE - 1);
// |________| incremented here
Since it is a circular buffer AND the buffer size is a power of 2, then the & is an easy and fast way to roll over by simply masking. Assuming that the BUFFERSIZE is 256, then:
num & (256 - 1) == num % 256
num & (0x100 - 1) == num % 0x100
num & (0x0ff) == num % 0x100
When the number is not a power of 2, then you can't use the masking technique:
num & (257 - 1) != num % 257
num & (0x101 - 1) != num % 0x101
num & 0x100 != num % 0x101
The (void *) allows the compiler to choose an appropriate width for the BUFFERSIZE constant based on your pointer width... although it is generally best to know - and use! - the width before a statement like this.
I added the hex notation so to make more clear why the & results in an emulated rollover event. Note that 0xff is binary 0x11111111, so the AND operation is simply masking off the upper bits.
2 problems with this approach.
A) Using a pointer with a bit-wise operation is not portable code. #Ilja Everilä
char *aptr;
// error: invalid operands to binary & (have 'char *' and 'void *')
// The following increments the index: (not really)
// aptr = (aptr + 1) & (void *)(BUFFERSIZE-1);
B) With compilers that support the non-standard math on a void * akin to a char *, the math is wrong if aptr point to an object wider than char and BUFFERSIZE is the number of elements in the buffer and not the byte-size. Of course this depends on how the non-standard complier implements some_type * & void *. Why bother to unnecessarily code to use some implementation specific behavior?
Instead use i % BUFFERSIZE. This portable approach works when BUFFERSIZE is a power-of-2 and well as when it is not. When a compiler sees i % power-of-2 and i is some unsigned type, then the same code is certainly emitted as i & (power-of-2 - 1).
For compilers that do not recognize this optimization, then one should consider a better compiler.
#define BUFFERSIZE 256
int main(void) {
char buf[BUFFERSIZE];
// pointer solution
char *aptr = buf;
aptr = &buf[(aptr - buf + 1) % BUFFERSIZE];
// index solution
size_t index = 0;
index = (index + 1) % BUFFERSIZE;
}

Formula for memory alignment

While browsing through some kernel code, I found a formula for memory alignment as
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
So then I even write a program for this:
#include <stdio.h>
int main(int argc, char** argv) {
long long operand;
long long alignment;
if(argv[1]) {
operand = atol(argv[1]);
} else {
printf("Enter value to be aligned!\n");
return -1;
}
if(argv[2]) {
alignment = strtol(argv[2],NULL,16);
} else {
printf("\nDefaulting to 1MB alignment\n");
alignment = 0x100000;
}
long long aligned = ((operand + (alignment - 1)) & ~(alignment - 1));
printf("Aligned memory is: 0x%.8llx [Hex] <--> %lld\n",aligned,aligned);
return 0;
}
But I don't get this logic at all. How does this work?
Basically, the formula increase an integer operand (address) to a next address aligned to the alignment.
The expression
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
is basically the same as a bit easier to understand formula:
aligned = int((operand + (alignment - 1)) / alignment) * alignment
For example, having operand (address) 102 and alignment 10 we get:
aligned = int((102 + 9) / 10) * 10
aligned = int(111 / 10) * 10
aligned = 11 * 10
aligned = 110
First we add to the address 9 and get 111. Then, since our alignment is 10, basically we zero out the last digit, i.e. 111 / 10 * 10 = 110
Please note, that for each power of 10 alignment (i.e. 10, 100, 1000 etc) we basically zeroing out last digits.
On most CPUs, division and multiplication operations take much more time than bitwise operations, so let us get back to the original formula:
aligned = ((operand + (alignment - 1)) & ~(alignment - 1))
The second part of the formula makes sense only when alignment is a power of 2. For example:
... & ~(2 - 1) will zero last bit of address.
... & ~(64 - 1) will zero last 5 bits of address.
etc
Just like with zeroing out last few digits of an address for power of 10 alignments, we zeroing out last few bits for power of 2 alignments.
Hope it does make sense for you now.

represent memory in c

I am trying to write an Instruction Set Simulator in C to simulate a machine running ARM.
I need to be able to represent 4GB memory efficiently and after some digging have come to the solution of creating an array of 1024 pointers each pointing to a block of 4MB which is dynamically allocated at its first use
#define MEMSIZE 1024 //1024 * 2Mb = 4Gb
#define PAGESIZE 4194304 //4 Mb
#define PAGEEXP 22 //2^PAGEEXP = PAGESIZE
uint32_t* mem[MEMSIZE];
My question is how do I access a certain address of memory?
What I have tried is breaking the address into index and offset as below but this seems to only return 0 for both index and offset. (memAdd is the address I am trying to access)
memIdx = memAdd >> PAGEEXP;
memOfs = memAdd & PAGESIZE;
the functions I use to read/write once I have the address are below:
void memWrite(uint32_t idx, uint32_t ofs, uint32_t val)
{
if(mem[idx] == 0)
mem[idx] = malloc(PAGESIZE);
*(mem[idx] + ofs) = *(mem[idx] + ofs) & val;
}
uint32_t memRead(uint32_t idx, uint32_t ofs)
{
if(mem[idx] == 0)
return 0;
else
return *(mem[idx] + ofs);
}
these seem right in my head however I am still not 100% comfortable with pointers so this may be wrong.
Sorry if this has already been discussed somewhere but I couldnt find anything relevent to what I need ( my keywords are pretty broad)
Start out looking at it logically instead of at the bit level.
You have pages of 4,194,304 bytes each.
Arithmetically, then, to turn a linear address into a (page, offset) pair, you divide by 4,194,304 to get the page number, and take the remainder to get the offset into the page.
page = address / PAGESIZE;
offset = address % PAGESIZE;
Since you want to do this efficiently and these are powers of 2, you can replace division by PAGESIZE with right-shift by the base-2 logarithm of PAGESIZE, which is 22:
page = address >> PAGEEXP;
So that part of your code is correct. However, what you want to do to get the offset is to mask out all but the bits you just shifted out of the page number. To do that, you have to AND with PAGESIZE - 1.
offset = address & (PAGESIZE - 1);
This is because in binary, what you're starting with is a number that looks like this (where p is a page number bit and o is an offset bit):
address = ppppppppppoooooooooooooooooooooo
You want to get the page number and the offset number by themselves. You clearly want to shift right by 22 bits to get the page number:
page = addresss >> 22 = 0000000000000000000000pppppppppp
But if you AND with the pagesize (00000000010000000000000000000000 in binary), you'll only have one at most one 1-bit in the answer, and it will just tell you if the page number is odd or even. Not useful.
What you want to AND with is instead one bit less than that, which is binary 00000000001111111111111111111111, thus:
ppppppppppoooooooooooooooooooooo
& 00000000001111111111111111111111
-----------------------------------
= 0000000000oooooooooooooooooooooo
which is how you get the offset.
This is a general rule: if N is an integer power of 2, then division by N is the same as right-shifting by log(N)/log(2), and the remainder of such a division is given by ANDing with (N-1).
If PAGESIZE is a power of 2, it has only 1 bit set. Hence AND-ing it with another value can only leave zero or one bit set in the result. Two possible values. But you're using it as an array index.
Also your memWrite(uint32_t idx, uint32_t ofs, uint32_t val) function always ANDs in the value of val. Hence for example if val is uint32_max any call to this function will have no effect.
Last, not only do you not check the result of malloc() for failure, you don't initialise the memory block which is returned.
Try an approach like this (unfortunately I have been unable to test it, I have no compiler handy just now).
enum { SIM_PAGE_BITS = 22 }; // 2^22 = 4MiB
enum { SIM_MEM_PAGES = 1024 }; // 1024 * 4MiB = 4GiB
enum { SIM_PAGE_SIZE = (1<<SIM_PAGE_BITS) };
enum { SIM_PAGE_MASK = SIM_PAGE_SIZE-1 };
enum { UNINITIALISED_MEMORY_CONTENT = 0 };
enum { WORD_BYTES = sizeof(uint32_t)/sizeof(unsigned char) };
#define PAGE_OFFSET(addr) (SIM_PAGE_MASK & (uint32_t)addr)
// cast to unsigned type to avoid sign extension surprises if addr<0
#define PAGE_NUM(addr) (((uint32_t)addr) >> SIM_PAGE_BITS)
#define IS_UNALIGNED(addr) (addr & (WORD_BYTES-1))
unsigned char* mem[MEMSIZE];
uint32_t memRead(uint32_t addr) {
if (IS_UNALIGNED(addr)) return handle_unaligned_read(addr);
const uint32_t page = PAGE_NUM(addr);
if (mem[page]) {
const unsigned char *p = mem[page] + PAGE_OFFSET(addr);
return *(uint32_t*)p;
} else {
return UNINITIALISED_MEMORY_CONTENT;
}
}
void memWrite(uint32_t addr, uint32_t val) {
if (IS_UNALIGNED(addr)) return handle_unaligned_write(addr, val);
const uint32_t page = PAGE_NUM(addr);
if (!mem[page]) {
if (val == UNINITIALISED_MEMORY_CONTENT) {
return;
}
mem[page] = malloc(SIM_PAGE_SIZE);
if (!mem[page]) {
handle_out_of_memory();
}
// If UNINITIALISED_MEMORY_CONTENT is always 0 we can
// use calloc instead of malloc then memset.
memset(mem[page], UNINITIALISED_MEMORY_CONTENT, SIM_PAGE_SIZE);
}
const unsigned char *p = mem[page] + PAGE_OFFSET(addr);
*(uint32_t*)p = val;
}
This will do what you want. I've used smaller sizes. I've left out the error checking for clarity. It uses your scheme of using an indexer array.
#include <cstdlib>
#include <cstdio>
#include <stdint.h>
#define NUMPAGE 1024
#define NUMINTSPERPAGE 4
uint32_t* buf;
uint32_t* idx[NUMPAGE];
void InitBuf()
{
buf = (uint32_t*) calloc(NUMPAGE, NUMINTSPERPAGE * sizeof uint32_t );
for ( size_t i = 0; i < NUMPAGE; i++ )
{
idx[i] = &buf[i * NUMINTSPERPAGE * sizeof uint32_t];
}
}
void memWrite(size_t i, size_t ofs, uint32_t val)
{
idx[i][ofs] = val;
}
uint32_t memRead(size_t i, size_t ofs)
{
return idx[i][ofs];
}
int main()
{
InitBuf();
uint32_t val = 1243;
memWrite(1, 2, val);
printf("difference = %ld", val - memRead(1, 2));
getchar();
}
I don't believe the value of memOfs is being calculated correctly. For instance, the decimal value 4194304 represented by PAGESIZE is 0x400000 in hexadecimal, which means that after the bitwise-AND operation, you're only getting bit 22 of the original address, not the lower 22 bits. Adding that value to the 4MB page-array-pointer actually sends you beyond the end of the allocated array on the heap. Change your mask for the offset calculation to 0x3FFFFF, and then bitwise-AND that with the original memory address in order to calculate the proper offset into the page. So for instance:
memIdx = memAdd >> PAGEEXP;
memOfs = memAdd & 0x3FFFFF; //value of memOfs will be between 0 and 4194303

UTF-16 decoder not working as expected

I have a part of my Unicode library that decodes UTF-16 into raw Unicode code points. However, it isn't working as expected.
Here's the relevant part of the code (omitting UTF-8 and string manipulation stuff):
typedef struct string {
unsigned long length;
unsigned *data;
} string;
string *upush(string *s, unsigned c) {
if (!s->length) s->data = (unsigned *) malloc((s->length = 1) * sizeof(unsigned));
else s->data = (unsigned *) realloc(s->data, ++s->length * sizeof(unsigned));
s->data[s->length - 1] = c;
return s;
}
typedef struct string16 {
unsigned long length;
unsigned short *data;
} string16;
string u16tou(string16 old) {
unsigned long i, cur = 0, need = 0;
string new;
new.length = 0;
for (i = 0; i < old.length; i++)
if (old.data[i] < 0xd800 || old.data[i] > 0xdfff) upush(&new, old.data[i]);
else
if (old.data[i] > 0xdbff && !need) {
cur = 0; continue;
} else if (old.data[i] < 0xdc00) {
need = 1;
cur = (old.data[i] & 0x3ff) << 10;
printf("cur 1: %lx\n", cur);
} else if (old.data[i] > 0xdbff) {
cur |= old.data[i] & 0x3ff;
upush(&new, cur);
printf("cur 2: %lx\n", cur);
cur = need = 0;
}
return new;
}
How does it work?
string is a struct that holds 32-bit values, and string16 is for 16-bit values like UTF-16. All upush does is add a full Unicode code point to a string, reallocating memory as needed.
u16tou is the part that I'm focusing on. It loops through the string16, passing non-surrogate values through as normal, and converting surrogate pairs into full code points. Misplaced surrogates are ignored.
The first surrogate in a pair has its lowest 10 bits shifted 10 bits to the left, resulting in it forming the high 10 bits of the final code point. The other surrogate has its lowest 10 bits added to the final, and then it is appended to the string.
The problem?
Let's try the highest code point, shall we?
U+10FFFD, the last valid Unicode code point, is encoded as 0xDBFF 0xDFFD in UTF-16. Let's try decoding that.
string16 b;
b.length = 2;
b.data = (unsigned short *) malloc(2 * sizeof(unsigned short));
b.data[0] = 0xdbff;
b.data[1] = 0xdffd;
string a = u16tou(b);
puts(utoc(a));
Using the utoc (not shown; I know it's working (see below)) function to convert it back to a UTF-8 char * for printing, I can see in my terminal that I'm getting U+0FFFFD, not U+10FFFD as a result.
In the calculator
Doing all the conversions manually in gcalctool results in the same, wrong answer. So my syntax itself isn't wrong, but the algorithm is. The algorithm seems right to me though, and yet it's ending in the wrong answer.
What am I doing wrong?
You need to add on 0x10000 when decoding the surrogate pair; to quote rfc 2781, the step you're missing is number 5:
1) If W1 < 0xD800 or W1 > 0xDFFF, the character value U is the value
of W1. Terminate.
2) Determine if W1 is between 0xD800 and 0xDBFF. If not, the sequence
is in error and no valid character can be obtained using W1.
Terminate.
3) If there is no W2 (that is, the sequence ends with W1), or if W2
is not between 0xDC00 and 0xDFFF, the sequence is in error.
Terminate.
4) Construct a 20-bit unsigned integer U', taking the 10 low-order
bits of W1 as its 10 high-order bits and the 10 low-order bits of
W2 as its 10 low-order bits.
5) Add 0x10000 to U' to obtain the character value U. Terminate.
ie. one fix would be to add an extra line after your first read:
cur = (old.data[i] & 0x3ff) << 10;
cur += 0x10000;
You seem to be missing an offset of 0x10000.
According to this WIKI page, UTF-16 surrogate pairs are constructed like this:
UTF-16 represents non-BMP characters
(U+10000 through U+10FFFF) using two
code units, known as a surrogate pair.
First 1000016 is subtracted from the
code point to give a 20-bit value.
This is then split into two 10-bit
values each of which is represented as
a surrogate with the most significant
half placed in the first surrogate.

Resources