Direct I/O with C: array vs. pointer [duplicate] - arrays
I just finished a test as part of a job interview, and one question stumped me, even using Google for reference. I'd like to see what the StackOverflow crew can do with it:
The memset_16aligned function requires a 16-byte aligned pointer passed to it, or it will crash.
a) How would you allocate 1024 bytes of memory, and align it to a 16 byte boundary?
b) Free the memory after the memset_16aligned has executed.
{
void *mem;
void *ptr;
// answer a) here
memset_16aligned(ptr, 0, 1024);
// answer b) here
}
Original answer
{
void *mem = malloc(1024+16);
void *ptr = ((char *)mem+16) & ~ 0x0F;
memset_16aligned(ptr, 0, 1024);
free(mem);
}
Fixed answer
{
void *mem = malloc(1024+15);
void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F;
memset_16aligned(ptr, 0, 1024);
free(mem);
}
Explanation as requested
The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long, double, long double, long long, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.)
The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works.
The last step, free(), is easy: you always, and only, return to free() a value that one of malloc(), calloc() or realloc() returned to you — anything else is a disaster. You correctly provided mem to hold that value — thank you. The free releases it.
Finally, if you know about the internals of your system's malloc package, you could guess that it might well return 16-byte aligned data (or it might be 8-byte aligned). If it was 16-byte aligned, then you'd not need to dink with the values. However, this is dodgy and non-portable — other malloc packages have different minimum alignments, and therefore assuming one thing when it does something different would lead to core dumps. Within broad limits, this solution is portable.
Someone else mentioned posix_memalign() as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier.
One more comment — this code does not check that the allocation succeeded.
Amendment
Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.]
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void memset_16aligned(void *space, char byte, size_t nbytes)
{
assert((nbytes & 0x0F) == 0);
assert(((uintptr_t)space & 0x0F) == 0);
memset(space, byte, nbytes); // Not a custom implementation of memset()
}
int main(void)
{
void *mem = malloc(1024+15);
void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F);
printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
memset_16aligned(ptr, 0, 1024);
free(mem);
return(0);
}
And here is a marginally more generalized version, which will work for sizes which are a power of 2:
#include <assert.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
static void memset_16aligned(void *space, char byte, size_t nbytes)
{
assert((nbytes & 0x0F) == 0);
assert(((uintptr_t)space & 0x0F) == 0);
memset(space, byte, nbytes); // Not a custom implementation of memset()
}
static void test_mask(size_t align)
{
uintptr_t mask = ~(uintptr_t)(align - 1);
void *mem = malloc(1024+align-1);
void *ptr = (void *)(((uintptr_t)mem+align-1) & mask);
assert((align & (align - 1)) == 0);
printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr);
memset_16aligned(ptr, 0, 1024);
free(mem);
}
int main(void)
{
test_mask(16);
test_mask(32);
test_mask(64);
test_mask(128);
return(0);
}
To convert test_mask() into a general purpose allocation function, the single return value from the allocator would have to encode the release address, as several people have indicated in their answers.
Problems with interviewers
Uri commented: Maybe I am having [a] reading comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer?
My response won't fit into a 300-character comment...
It depends, I suppose. I think most people (including me) took the question to mean "How would you allocate a space in which 1024 bytes of data can be stored, and where the base address is a multiple of 16 bytes". If the interviewer really meant how can you allocate 1024 bytes (only) and have it 16-byte aligned, then the options are more limited.
Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful.
Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator.
However, if the interviewer expected either of those responses, I'd expect them to recognize that this solution answers a closely related question, and then to reframe their question to point the conversation in the correct direction. (Further, if the interviewer got really stroppy, then I wouldn't want the job; if the answer to an insufficiently precise requirement is shot down in flames without correction, then the interviewer is not someone for whom it is safe to work.)
The world moves on
The title of the question has changed recently. It was Solve the memory alignment in C interview question that stumped me. The revised title (How to allocate aligned memory only using the standard library?) demands a slightly revised answer — this addendum provides it.
C11 (ISO/IEC 9899:2011) added function aligned_alloc():
7.22.3.1 The aligned_alloc function
Synopsis
#include <stdlib.h>
void *aligned_alloc(size_t alignment, size_t size);
Description
The aligned_alloc function allocates space for an object whose alignment is
specified by alignment, whose size is specified by size, and whose value is
indeterminate. The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment.
Returns
The aligned_alloc function returns either a null pointer or a pointer to the allocated space.
And POSIX defines posix_memalign():
#include <stdlib.h>
int posix_memalign(void **memptr, size_t alignment, size_t size);
DESCRIPTION
The posix_memalign() function shall allocate size bytes aligned on a boundary specified by alignment, and shall return a pointer to the allocated memory in memptr. The value of alignment shall be a power of two multiple of sizeof(void *).
Upon successful completion, the value pointed to by memptr shall be a multiple of alignment.
If the size of the space requested is 0, the behavior is implementation-defined; the value returned in memptr shall be either a null pointer or a unique pointer.
The free() function shall deallocate memory that has previously been allocated by posix_memalign().
RETURN VALUE
Upon successful completion, posix_memalign() shall return zero; otherwise, an error number shall be returned to indicate the error.
Either or both of these could be used to answer the question now, but only the POSIX function was an option when the question was originally answered.
Behind the scenes, the new aligned memory function do much the same job as outlined in the question, except they have the ability to force the alignment more easily, and keep track of the start of the aligned memory internally so that the code doesn't have to deal with specially — it just frees the memory returned by the allocation function that was used.
Three slightly different answers depending how you look at the question:
1) Good enough for the exact question asked is Jonathan Leffler's solution, except that to round up to 16-aligned, you only need 15 extra bytes, not 16.
A:
/* allocate a buffer with room to add 0-15 bytes to ensure 16-alignment */
void *mem = malloc(1024+15);
ASSERT(mem); // some kind of error-handling code
/* round up to multiple of 16: add 15 and then round down by masking */
void *ptr = ((char*)mem+15) & ~ (size_t)0x0F;
B:
free(mem);
2) For a more generic memory allocation function, the caller doesn't want to have to keep track of two pointers (one to use and one to free). So you store a pointer to the 'real' buffer below the aligned buffer.
A:
void *mem = malloc(1024+15+sizeof(void*));
if (!mem) return mem;
void *ptr = ((char*)mem+sizeof(void*)+15) & ~ (size_t)0x0F;
((void**)ptr)[-1] = mem;
return ptr;
B:
if (ptr) free(((void**)ptr)[-1]);
Note that unlike (1), where only 15 bytes were added to mem, this code could actually reduce the alignment if your implementation happens to guarantee 32-byte alignment from malloc (unlikely, but in theory a C implementation could have a 32-byte aligned type). That doesn't matter if all you do is call memset_16aligned, but if you use the memory for a struct then it could matter.
I'm not sure off-hand what a good fix is for this (other than to warn the user that the buffer returned is not necessarily suitable for arbitrary structs) since there's no way to determine programatically what the implementation-specific alignment guarantee is. I guess at startup you could allocate two or more 1-byte buffers, and assume that the worst alignment you see is the guaranteed alignment. If you're wrong, you waste memory. Anyone with a better idea, please say so...
[Added:
The 'standard' trick is to create a union of 'likely to be maximally aligned types' to determine the requisite alignment. The maximally aligned types are likely to be (in C99) 'long long', 'long double', 'void *', or 'void (*)(void)'; if you include <stdint.h>, you could presumably use 'intmax_t' in place of long long (and, on Power 6 (AIX) machines, intmax_t would give you a 128-bit integer type). The alignment requirements for that union can be determined by embedding it into a struct with a single char followed by the union:
struct alignment
{
char c;
union
{
intmax_t imax;
long double ldbl;
void *vptr;
void (*fptr)(void);
} u;
} align_data;
size_t align = (char *)&align_data.u.imax - &align_data.c;
You would then use the larger of the requested alignment (in the example, 16) and the align value calculated above.
On (64-bit) Solaris 10, it appears that the basic alignment for the result from malloc() is a multiple of 32 bytes.
]
In practice, aligned allocators often take a parameter for the alignment rather than it being hardwired. So the user will pass in the size of the struct they care about (or the least power of 2 greater than or equal to that) and all will be well.
3) Use what your platform provides: posix_memalign for POSIX, _aligned_malloc on Windows.
4) If you use C11, then the cleanest - portable and concise - option is to use the standard library function aligned_alloc that was introduced in this version of the language specification.
You could also try posix_memalign() (on POSIX platforms, of course).
Here's an alternate approach to the 'round up' part. Not the most brilliantly coded solution but it gets the job done, and this type of syntax is a bit easier to remember (plus would work for alignment values that aren't a power of 2). The uintptr_t cast was necessary to appease the compiler; pointer arithmetic isn't very fond of division or multiplication.
void *mem = malloc(1024 + 15);
void *ptr = (void*) ((uintptr_t) mem + 15) / 16 * 16;
memset_16aligned(ptr, 0, 1024);
free(mem);
Unfortunately, in C99 it seems pretty tough to guarantee alignment of any sort in a way which would be portable across any C implementation conforming to C99. Why? Because a pointer is not guaranteed to be the "byte address" one might imagine with a flat memory model. Neither is the representation of uintptr_t so guaranteed, which itself is an optional type anyway.
We might know of some implementations which use a representation for void * (and by definition, also char *) which is a simple byte address, but by C99 it is opaque to us, the programmers. An implementation might represent a pointer by a set {segment, offset} where offset could have who-knows-what alignment "in reality." Why, a pointer could even be some form of hash table lookup value, or even a linked-list lookup value. It could encode bounds information.
In a recent C1X draft for a C Standard, we see the _Alignas keyword. That might help a bit.
The only guarantee C99 gives us is that the memory allocation functions will return a pointer suitable for assignment to a pointer pointing at any object type. Since we cannot specify the alignment of objects, we cannot implement our own allocation functions with responsibility for alignment in a well-defined, portable manner.
It would be good to be wrong about this claim.
On the 16 vs 15 byte-count padding front, the actual number you need to add to get an alignment of N is max(0,N-M) where M is the natural alignment of the memory allocator (and both are powers of 2).
Since the minimal memory alignment of any allocator is 1 byte, 15=max(0,16-1) is a conservative answer. However, if you know your memory allocator is going to give you 32-bit int aligned addresses (which is fairly common), you could have used 12 as a pad.
This isn't important for this example but it might be important on an embedded system with 12K of RAM where every single int saved counts.
The best way to implement it if you're actually going to try to save every byte possible is as a macro so you can feed it your native memory alignment. Again, this is probably only useful for embedded systems where you need to save every byte.
In the example below, on most systems, the value 1 is just fine for MEMORY_ALLOCATOR_NATIVE_ALIGNMENT, however for our theoretical embedded system with 32-bit aligned allocations, the following could save a tiny bit of precious memory:
#define MEMORY_ALLOCATOR_NATIVE_ALIGNMENT 4
#define ALIGN_PAD2(N,M) (((N)>(M)) ? ((N)-(M)) : 0)
#define ALIGN_PAD(N) ALIGN_PAD2((N), MEMORY_ALLOCATOR_NATIVE_ALIGNMENT)
Perhaps they would have been satisfied with a knowledge of memalign? And as Jonathan Leffler points out, there are two newer preferable functions to know about.
Oops, florin beat me to it. However, if you read the man page I linked to, you'll most likely understand the example supplied by an earlier poster.
We do this sort of thing all the time for Accelerate.framework, a heavily vectorized OS X / iOS library, where we have to pay attention to alignment all the time. There are quite a few options, one or two of which I didn't see mentioned above.
The fastest method for a small array like this is just stick it on the stack. With GCC / clang:
void my_func( void )
{
uint8_t array[1024] __attribute__ ((aligned(16)));
...
}
No free() required. This is typically two instructions: subtract 1024 from the stack pointer, then AND the stack pointer with -alignment. Presumably the requester needed the data on the heap because its lifespan of the array exceeded the stack or recursion is at work or stack space is at a serious premium.
On OS X / iOS all calls to malloc/calloc/etc. are always 16 byte aligned. If you needed 32 byte aligned for AVX, for example, then you can use posix_memalign:
void *buf = NULL;
int err = posix_memalign( &buf, 32 /*alignment*/, 1024 /*size*/);
if( err )
RunInCirclesWaivingArmsWildly();
...
free(buf);
Some folks have mentioned the C++ interface that works similarly.
It should not be forgotten that pages are aligned to large powers of two, so page-aligned buffers are also 16 byte aligned. Thus, mmap() and valloc() and other similar interfaces are also options. mmap() has the advantage that the buffer can be allocated preinitialized with something non-zero in it, if you want. Since these have page aligned size, you will not get the minimum allocation from these, and it will likely be subject to a VM fault the first time you touch it.
Cheesy: Turn on guard malloc or similar. Buffers that are n*16 bytes in size such as this one will be n*16 bytes aligned, because VM is used to catch overruns and its boundaries are at page boundaries.
Some Accelerate.framework functions take in a user supplied temp buffer to use as scratch space. Here we have to assume that the buffer passed to us is wildly misaligned and the user is actively trying to make our life hard out of spite. (Our test cases stick a guard page right before and after the temp buffer to underline the spite.) Here, we return the minimum size we need to guarantee a 16-byte aligned segment somewhere in it, and then manually align the buffer afterward. This size is desired_size + alignment - 1. So, In this case that is 1024 + 16 - 1 = 1039 bytes. Then align as so:
#include <stdint.h>
void My_func( uint8_t *tempBuf, ... )
{
uint8_t *alignedBuf = (uint8_t*)
(((uintptr_t) tempBuf + ((uintptr_t)alignment-1))
& -((uintptr_t) alignment));
...
}
Adding alignment-1 will move the pointer past the first aligned address and then ANDing with -alignment (e.g. 0xfff...ff0 for alignment=16) brings it back to the aligned address.
As described by other posts, on other operating systems without 16-byte alignment guarantees, you can call malloc with the larger size, set aside the pointer for free() later, then align as described immediately above and use the aligned pointer, much as described for our temp buffer case.
As for aligned_memset, this is rather silly. You only have to loop in up to 15 bytes to reach an aligned address, and then proceed with aligned stores after that with some possible cleanup code at the end. You can even do the cleanup bits in vector code, either as unaligned stores that overlap the aligned region (providing the length is at least the length of a vector) or using something like movmaskdqu. Someone is just being lazy. However, it is probably a reasonable interview question if the interviewer wants to know whether you are comfortable with stdint.h, bitwise operators and memory fundamentals, so the contrived example can be forgiven.
I'm surprised noone's voted up Shao's answer that, as I understand it, it is impossible to do what's asked in standard C99, since converting a pointer to an integral type formally is undefined behavior. (Apart from the standard allowing conversion of uintptr_t <-> void*, but the standard does not seem to allow doing any manipulations of the uintptr_t value and then converting it back.)
usage of memalign, Aligned-Memory-Blocks might be a good solution for the problem.
The first thing that popped into my head when reading this question was to define an aligned struct, instantiate it, and then point to it.
Is there a fundamental reason I'm missing since no one else suggested this?
As a sidenote, since I used an array of char (assuming the system's char is 8 bits (i.e. 1 byte)), I don't see the need for the __attribute__((packed)) necessarily (correct me if I'm wrong), but I put it in anyway.
This works on two systems I tried it on, but it's possible that there is a compiler optimization that I'm unaware of giving me false positives vis-a-vis the efficacy of the code. I used gcc 4.9.2 on OSX and gcc 5.2.1 on Ubuntu.
#include <stdio.h>
#include <stdlib.h>
int main ()
{
void *mem;
void *ptr;
// answer a) here
struct __attribute__((packed)) s_CozyMem {
char acSpace[16];
};
mem = malloc(sizeof(struct s_CozyMem));
ptr = mem;
// memset_16aligned(ptr, 0, 1024);
// Check if it's aligned
if(((unsigned long)ptr & 15) == 0) printf("Aligned to 16 bytes.\n");
else printf("Rubbish.\n");
// answer b) here
free(mem);
return 1;
}
MacOS X specific:
All pointers allocated with malloc are 16 bytes aligned.
C11 is supported, so you can just call aligned_malloc (16, size).
MacOS X picks code that is optimised for individual processors at boot time for memset, memcpy and memmove and that code uses tricks that you've never heard of to make it fast. 99% chance that memset runs faster than any hand-written memset16 which makes the whole question pointless.
If you want a 100% portable solution, before C11 there is none. Because there is no portable way to test alignment of a pointer. If it doesn't have to be 100% portable, you can use
char* p = malloc (size + 15);
p += (- (unsigned int) p) % 16;
This assumes that the alignment of a pointer is stored in the lowest bits when converting a pointer to unsigned int. Converting to unsigned int loses information and is implementation defined, but that doesn't matter because we don't convert the result back to a pointer.
The horrible part is of course that the original pointer must be saved somewhere to call free () with it. So all in all I would really doubt the wisdom of this design.
You can also add some 16 bytes and then push the original ptr to 16bit aligned by adding the (16-mod) as below the pointer :
main(){
void *mem1 = malloc(1024+16);
void *mem = ((char*)mem1)+1; // force misalign ( my computer always aligns)
printf ( " ptr = %p \n ", mem );
void *ptr = ((long)mem+16) & ~ 0x0F;
printf ( " aligned ptr = %p \n ", ptr );
printf (" ptr after adding diff mod %p (same as above ) ", (long)mem1 + (16 -((long)mem1%16)) );
free(mem1);
}
If there are constraints that, you cannot waste a single byte, then this solution works:
Note: There is a case where this may be executed infinitely :D
void *mem;
void *ptr;
try:
mem = malloc(1024);
if (mem % 16 != 0) {
free(mem);
goto try;
}
ptr = mem;
memset_16aligned(ptr, 0, 1024);
For the solution i used a concept of padding which aligns the memory and do not waste the
memory of a single byte .
If there are constraints that, you cannot waste a single byte.
All pointers allocated with malloc are 16 bytes aligned.
C11 is supported, so you can just call aligned_alloc (16, size).
void *mem = malloc(1024+16);
void *ptr = ((char *)mem+16) & ~ 0x0F;
memset_16aligned(ptr, 0, 1024);
free(mem);
size =1024;
alignment = 16;
aligned_size = size +(alignment -(size % alignment));
mem = malloc(aligned_size);
memset_16aligned(mem, 0, 1024);
free(mem);
Hope this one is the simplest implementation, let me know your comments.
long add;
mem = (void*)malloc(1024 +15);
add = (long)mem;
add = add - (add % 16);//align to 16 byte boundary
ptr = (whatever*)(add);
Related
What alignment issues limit the use of a block of memory created by malloc?
I am writing a library for various mathematical computations in C. Several of these need some "scratch" space -- memory that is used for intermediate calculations. The space required depends on the size of the inputs, so it cannot be statically allocated. The library will typically be used to perform many iterations of the same type of calculation with the same size inputs, so I'd prefer not to malloc and free inside the library for each call; it would be much more efficient to allocate a large enough block once, re-use it for all the calculations, then free it. My intended strategy is to request a void pointer to a single block of memory, perhaps with an accompanying allocation function. Say, something like this: void *allocateScratch(size_t rows, size_t columns); void doCalculation(size_t rows, size_t columns, double *data, void *scratch); The idea is that if the user intends to do several calculations of the same size, he may use the allocate function to grab a block that is large enough, then use that same block of memory to perform the calculation for each of the inputs. The allocate function is not strictly necessary, but it simplifies the interface and makes it easier to change the storage requirements in the future, without each user of the library needing to know exactly how much space is required. In many cases, the block of memory I need is just a large array of type double, no problems there. But in some cases I need mixed data types -- say a block of doubles AND a block of integers. My code needs to be portable and should conform to the ANSI standard. I know that it is OK to cast a void pointer to any other pointer type, but I'm concerned about alignment issues if I try to use the same block for two types. So, specific example. Say I need a block of 3 doubles and 5 ints. Can I implement my functions like this: void *allocateScratch(...) { return malloc(3 * sizeof(double) + 5 * sizeof(int)); } void doCalculation(..., void *scratch) { double *dblArray = scratch; int *intArray = ((unsigned char*)scratch) + 3 * sizeof(double); } Is this legal? The alignment probably works out OK in this example, but what if I switch it around and take the int block first and the double block second, that will shift the alignment of the double's (assuming 64-bit doubles and 32-bit ints). Is there a better way to do this? Or a more standard approach I should consider? My biggest goals are as follows: I'd like to use a single block if possible so the user doesn't have to deal with multiple blocks or a changing number of blocks required. I'd like the block to be a valid block obtained by malloc so the user can call free when finished. This means I don't want to do something like creating a small struct that has pointers to each block and then allocating each block separately, which would require a special destroy function; I'm willing to do that if that's the "only" way. The algorithms and memory requirements may change, so I'm trying to use the allocate function so that future versions can get different amounts of memory for potentially different types of data without breaking backward compatibility. Maybe this issue is addressed in the C standard, but I haven't been able to find it.
The memory of a single malloc can be partitioned for use in multiple arrays as shown below. Suppose we want arrays of types A, B, and C with NA, NB, and NC elements. We do this: size_t Offset = 0; ptrdiff_t OffsetA = Offset; // Put array at current offset. Offset += NA * sizeof(A); // Move offset to end of array. Offset = RoundUp(Offset, sizeof(B)); // Align sufficiently for type. ptrdiff_t OffsetB = Offset; // Put array at current offset. Offset += NB * sizeof(B); // Move offset to end of array. Offset = RoundUp(Offset, sizeof(C)); // Align sufficiently for type. ptrdiff_t OffsetC = Offset; // Put array at current offset. Offset += NC * sizeof(C); // Move offset to end of array. unsigned char *Memory = malloc(Offset); // Allocate memory. // Set pointers for arrays. A *pA = Memory + OffsetA; B *pB = Memory + OffsetB; C *pC = Memory + OffsetC; where RoundUp is: // Return Offset rounded up to a multiple of Size. size_t RoundUp(size_t Offset, size_t Size) { size_t x = Offset + Size - 1; return x - x % Size; } This uses the fact, as noted by R.., that the size of a type must be a multiple of the alignment requirement for that type. In C 2011, sizeof in the RoundUp calls can be changed to _Alignof, and this may save a small amount of space when the alignment requirement of a type is less than its size.
If the user is calling your library's allocation function, then they should call your library's freeing function. This is very typical (and good) interface design. So I would say just go with the struct of pointers to different pools for your different types. That's clean, simple, and portable, and anybody who reads your code will see exactly what you are up to. If you do not mind wasting memory and insist on a single block, you could create a union with all of your types and then allocate an array of those... Trying to find appropriately aligned memory in a massive block is just a mess. I am not even sure you can do it portably. What's the plan? Cast pointers to intptr_t, do some rounding, then cast back to a pointer?
The latest C11 standard has the max_align_t type (and _Alignas specifier and _Alignof operator and <stdalign.h> header). GCC compiler has a __BIGGEST_ALIGNMENT__ macro (giving the maximal size alignment). It also proves some extensions related to alignment. Often, using 2*sizeof(void*) (as the biggest relevant alignment) is in practice quite safe (at least on most of the systems I heard about these days; but one could imagine weird processors and systems where it is not the case, perhaps some DSP-s). To be sure, study the details of the ABI and calling conventions of your particular implementation, e.g. x86-64 ABI and x86 calling conventions... And the system malloc is guaranteed to return a sufficiently aligned pointer (for all purposes). On some systems and targets and some processors giving a larger alignment might give performance benefit (notably when asking the compiler to optimize). You may have to (or want to) tell the compiler about that, e.g. on GCC using variable attributes... Don't forget that according to Fulton there is no such thing as portable software, only software that has been ported. but intptr_t and max_align_t is here to help you....
Note that the required alignment for any type must evenly divide the size of the type; this is a consequence of the representation of array types. Thus, in the absence of C11 features to determine the required alignment for a type, you can just estimate conservatively and use the type's size. In other words, if you want to carve up part of an allocation from malloc for use storing doubles, make sure it starts at an offset that's a multiple of sizeof(double).
sizeof sideeffect and allocation location [duplicate]
This question already has answers here: Closed 10 years ago. Possible Duplicate: Why isn’t sizeof for a struct equal to the sum of sizeof of each member? I can not understand why is it like this: #include <stdio.h> #include <stdlib.h> typedef struct { char b; int a; } A; typedef struct { char b; } B; int main() { A object; printf("sizeof char is: %d\n",sizeof(char)); printf("sizeof int is: %d\n",sizeof(int)); printf("==> the sizeof both are: %d\n",sizeof(int)+sizeof(char)); printf("and yet the sizeof struct A is: %d\n",sizeof(object)); printf("why?\n"); B secondObject; printf("pay attention that the sizeof struct B is: %d which is equal to the " "sizeof char\n",sizeof(secondObject)); return 0; } I think I explained my question in the code and there is no more need to explain. besides I have another question: I know there is allocation on the: heap/static heap/stack, but what is that means that the allocation location is unknown, How could it be ? I am talking about this example: typedef struct { char *_name; int _id; } Entry; int main() { Entry ** vec = (Entry**) malloc(sizeof(Entry*)*2); vec[0] = (Entry *) malloc(sizeof (Entry)); vec[0]->_name = (char*)malloc(6); strcpy (vec[0]->_name, "name"); vec[0]->_id = 0; return 0; } I know that: vec is on the stack. *vec is on the heap. *vec[0] is on the heap. vec[0]->id is on the heap. but : vec[0]->_name is unknown why ?
There is an unspecified amount of padding between the members of a structure and at the end of a structure. In C the size of a structure object is greater than or equal to the sum of the size of its members.
Take a look at this question as well as this one and many others if you search for CPU and memory alignment. In short, CPUs are happier if they access the memory aligned to the size of the data they are reading. For example, if you are reading a uint16_t, then it would be more efficient (on most CPUs) if you read at an address that is a multiple of 2. The details of why CPUs are designed in such a way is whole other story. This is why compilers come to the rescue and pad the fields of the structures in such a way that would be most comfortable for the CPU to access them, at the cost of extra storage space. In your case, you are probably given 3 byte of padding between your char and int, assuming int is 4 bytes. If you look at the C standard (which I don't have nearby right now), or the man page of malloc, you will see such a phrase: The malloc() and calloc() functions return a pointer to the allocated memory that is suitably aligned for any kind of variable. This behavior is exactly due to the same reason I mentioned above. So in short, memory alignment is something to care about, and that's what compilers do for you in struct layout and other places, such as layout of local variables etc.
You're running into structure padding here. The compiler is inserting likely inserting three bytes' worth of padding after the b field in struct A, so that the a field is 4-byte aligned. You can control this padding to some degree using compiler-specific bits; for example, on MSVC, the pack pragma, or the aligned attribute on GCC, but I would not recommend this. Structure padding is there to specify member alignment restrictions, and some architectures will fault on unaligned accesses. (Others might fixup the alignment manually, but typically do this rather slowly.) See also: http://en.wikipedia.org/wiki/Data_structure_alignment#Data_structure_padding As to your second question, I'm unsure what you mean by the name is "unknown". Care to elaborate?
The compiler is free to add padding in structures to ensure that datatypes are aligned properly. For example, an int will be aligned to sizeof(int) bytes. So I expect the output for the size of your A struct is 8. The compiler does this, because fetching an int from an unaligned address is at best inefficient, and at worst doesn't work at all - that depends on the processor that the computer uses. x86 will fetch happily from unaligned addresses for most data types, but will take about twice as long for the fetch operation. In your second code-snippet, you haven't declared i. So vec[0]->_name is not unknown - it is on the heap, just like anything else you get from "malloc" (and malloc's siblings).
Aligned memory management?
I have a few related questions about managing aligned memory blocks. Cross-platform answers would be ideal. However, as I'm pretty sure a cross-platform solution does not exist, I'm mainly interested in Windows and Linux and to a (much) lesser extent Mac OS and FreeBSD. What's the best way of getting a chunk of memory aligned on 16-byte boundaries? (I'm aware of the trivial method of using malloc(), allocating a little extra space and then bumping the pointer up to a properly aligned value. I'm hoping for something a little less kludge-y, though. Also, see below for additional issues.) If I use plain old malloc(), allocate extra space, and then move the pointer up to where it would be correctly aligned, is it necessary to keep the pointer to the beginning of the block around for freeing? (Calling free() on pointers to the middle of the block seems to work in practice on Windows, but I'm wondering what the standard says and, even if the standard says you can't, whether it works in practice on all major OS's. I don't care about obscure DS9K-like OS's.) This is the hard/interesting part. What's the best way to reallocate a memory block while preserving alignment? Ideally this would be something more intelligent than calling malloc(), copying, and then calling free() on the old block. I'd like to do it in place where possible.
If your implementation has a standard data type that needs 16-byte alignment (long long for example), malloc already guarantees that your returned blocks will be aligned correctly. Section 7.20.3 of C99 states The pointer returned if the allocation succeeds is suitably aligned so that it may be assigned to a pointer to any type of object. You have to pass back the exact same address into free as you were given by malloc. No exceptions. So yes, you need to keep the original copy. See (1) above if you already have a 16-byte-alignment-required type. Beyond that, you may well find that your malloc implementation gives you 16-byte-aligned addresses anyway for efficiency although it's not guaranteed by the standard. If you require it, you can always implement your own allocator. Myself, I'd implement a malloc16 layer on top of malloc that would use the following structure: some padding for alignment (0-15 bytes) size of padding (1 byte) 16-byte-aligned area Then have your malloc16() function call malloc to get a block 16 bytes larger than requested, figure out where the aligned area should be, put the padding length just before that and return the address of the aligned area. For free16, you would simply look at the byte before the address given to get the padding length, work out the actual address of the malloc'ed block from that, and pass that to free. This is untested but should be a good start: void *malloc16 (size_t s) { unsigned char *p; unsigned char *porig = malloc (s + 0x10); // allocate extra if (porig == NULL) return NULL; // catch out of memory p = (porig + 16) & (~0xf); // insert padding *(p-1) = p - porig; // store padding size return p; } void free16(void *p) { unsigned char *porig = p; // work out original porig = porig - *(porig-1); // by subtracting padding free (porig); // then free that } The magic line in the malloc16 is p = (porig + 16) & (~0xf); which adds 16 to the address then sets the lower 4 bits to 0, in effect bringing it back to the next lowest alignment point (the +16 guarantees it is past the actual start of the maloc'ed block). Now, I don't claim that the code above is anything but kludgey. You would have to test it in the platforms of interest to see if it's workable. Its main advantage is that it abstracts away the ugly bit so that you never have to worry about it.
I'm not aware of any way of requesting malloc return memory with stricter alignment than usual. As for "usual" on Linux, from man posix_memalign (which you can use instead of malloc() to get more strictly aligned memory if you like): GNU libc malloc() always returns 8-byte aligned memory addresses, so these routines are only needed if you require larger alignment values. You must free() memory using the same pointer returned by malloc(), posix_memalign() or realloc(). Use realloc() as usual, including sufficient extra space so if a new address is returned that isn't already aligned you can memmove() it slightly to align it. Nasty, but best I can think of.
You could write your own slab allocator to handle your objects, it could allocate pages at a time using mmap, maintain a cache of recently-freed addresses for fast allocations, handle all your alignment for you, and give you the flexibility to move/grow objects exactly as you need. malloc is quite good for general-purpose allocations, but if you know your data layout and allocation needs, you can design a system to hit those requirements exactly.
The trickiest requirement is obviously the third one, since any malloc() / realloc() based solution is hostage to realloc() moving the block to a different alignment. On Linux, you could use anonymous mappings created with mmap() instead of malloc(). Addresses returned by mmap() are by necessity page-aligned, and the mapping can be extended with mremap().
Starting a C11, you have void *aligned_alloc( size_t alignment, size_t size ); primitives, where the parameters are: alignment - specifies the alignment. Must be a valid alignment supported by the implementation. size - number of bytes to allocate. An integral multiple of alignment Return value On success, returns the pointer to the beginning of newly allocated memory. The returned pointer must be deallocated with free() or realloc(). On failure, returns a null pointer. Example: #include <stdio.h> #include <stdlib.h> int main(void) { int *p1 = malloc(10*sizeof *p1); printf("default-aligned addr: %p\n", (void*)p1); free(p1); int *p2 = aligned_alloc(1024, 1024*sizeof *p2); printf("1024-byte aligned addr: %p\n", (void*)p2); free(p2); } Possible output: default-aligned addr: 0x1e40c20 1024-byte aligned addr: 0x1e41000
Experiment on your system. On many systems (especially 64-bit ones), you get 16-byte aligned memory out of malloc() anyway. If not, you will have to allocate the extra space and move the pointer (by at most 8 bytes on almost every machine). For example, 64-bit Linux on x86/64 has a 16-byte long double, which is 16-byte aligned - so all memory allocations are 16-byte aligned anyway. However, with a 32-bit program, sizeof(long double) is 8 and memory allocations are only 8-byte aligned. Yes - you can only free() the pointer returned by malloc(). Anything else is a recipe for disaster. If your system does 16-byte aligned allocations, there isn't a problem. If it doesn't, then you'll need your own reallocator, which does a 16-byte aligned allocation and then copies the data - or that uses the system realloc() and adjusts the realigned data when necessary. Double check the manual page for your malloc(); there may be options and mechanisms to tweak it so it behaves as you want. On MacOS X, there is posix_memalign() and valloc() (which gives a page-aligned allocation), and there is a whole series of 'zoned malloc' functions identified by man malloc_zoned_malloc and the header is <malloc/malloc.h>.
You might be able to jimmy (in Microsoft VC++ and maybe other compilers): #pragma pack(16) such that malloc( ) is forced to return a 16-byte-aligned pointer. Something along the lines of: ptr_16byte = malloc( 10 * sizeof( my_16byte_aligned_struct )); If it worked at all for malloc( ), I'd think it would work for realloc( ) just as well. Just a thought. -- pete
What portability issues are associated with byte-level access to pointers in C?
Purpose I am writing a small library for a larger project which supplies malloc/realloc/free wrapper-functions as well as a function which can tell you whether or not its parameter (of type void *) corresponds to live (not yet freed) memory allocated and managed by the library's wrapper-functions. Let's refer to this function as isgood_memory. Internally, the library maintains a hash-table to ensure that the search performed by isgood_memory is reasonably fast. The hash-table maintains pointer values (elements of type void *) to make the search possible. Clearly, values are added and removed from the hash-table to keep it up-to-date with what has been allocated and what has been freed, respectively. The portability of the library is my biggest concern. It has been designed to assume only a mostly-compliant C90 (ISO/IEC 9899:1990) environment... nothing more. Question Since portability is my biggest concern, I couldn't assume that sizeof(void *) == sizeof(X) for the hash-function. Therefore, I have resorted to treating the value byte-by-byte as if it were a string. To accomplish this, the hash function looks a little like: static size_t hashit(void *ptrval) { size_t i = 0, h = 0; union { void *ptrval; unsigned char string[sizeof(void *)]; } ptrstr; ptrstr.ptrval = ptrval; for (; i < sizeof(void *); ++i) { size_t byte = ptrstr.string[i]; /* Crazy operations here... */ } return (h); } What portability concerns do any of you have with this particular fragment? Will I encounter any funky alignment issues by accessing ptrval byte-by-byte?
You are allowed to access a data type as an array of unsigned char, as you do here. The major portability issue that I see could occur on platforms where the bit-pattern identifying a particular location is not unique - in that case, you might get pointers that compare equal hashing to different locations because the bit patterns were different. Why could they be different? Well, for one thing, most C data types are allowed to contain padding bits that don't participate in the value. A platform where pointers contained such padding bits could have two pointers that differed only in the padding bits point to the same location. (For example, the OS might use some pointer bits to indicate capabilities of the pointer, not just physical address.) Another example is the far memory model from the early days of DOS, where far pointers consisted of segment:offset, and the adjacent segments overlapped, so that segment:offset could point to the same location as segment+1:offset-x. All that said, on most platforms in common use today, the bit pattern pointing to a given location is indeed unique. So your code will be widely portable, even though it is unlikely to be strictly conforming.
Looks pretty clean. If you can rely on the <inttypes.h> header from C99 (it is often available elsewhere), then consider using uintptr_t - but if you want to hash the value byte-wise, you end up breaking things down to bytes and there is no real advantage to it.
Mostly correct. There's one potential problem, though. you assign size_t byte = ptrstr.string[i]; *string is defined as char, not unsigned char. On the platform that has signed chars and unsigned size_t, it will give you result that you may or may not expect. Just change your char to unsigned char, that will be cleaner.
If you don't need the pointer values for some other reason beside keeping track of allocated memory, why not get rid of the hash table altogether and just store a magic number along with the memory allocated as in the example below. The magic number being present alongside the memory allocated indicates that it is still "alive". When freeing the memory you clear the stored magic number before freeing the memory. #pragma pack(1) struct sMemHdl { int magic; byte firstByte; }; #pragma pack() #define MAGIC 0xDEADDEAD #define MAGIC_SIZE sizeof(((struct sMemHdl *)0)->magic) void *get_memory( size_t request ) { struct sMemHdl *pMemHdl = (struct sMemHdl *)malloc(MAGIC_SIZE + request); pMemHdl->magic = MAGIC; return (void *)&pMemHdl->firstByte; } void free_memory ( void *mem ) { if ( isgood_memory(mem) != 0 ) { struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)mem - MAGIC_SIZE); pMemHdl->magic = 0; free(pMemHdl); } } int isgood_memory ( void *Mem ) { struct sMemHdl *pMemHdl = (struct sMemHdl *)((byte *)Mem - MAGIC_SIZE); if ( pMemHdl->magic == MAGIC ) { return 1; /* mem is good */ } else { return 0; /* mem already freed */ } } This may be a bit hackish, but I guess I'm in a hackish mood...
Accessing variables such integers or pointers as chars or unsigned chars in not a problem from a portability view. But the reverse is not true, because it is hardware dependent. I have one question, why are you hashing a pointer as a string instead of using the pointer itself as a hash value ( using uintptr_t) ?
How to allocate aligned memory only using the standard library?
I just finished a test as part of a job interview, and one question stumped me, even using Google for reference. I'd like to see what the StackOverflow crew can do with it: The memset_16aligned function requires a 16-byte aligned pointer passed to it, or it will crash. a) How would you allocate 1024 bytes of memory, and align it to a 16 byte boundary? b) Free the memory after the memset_16aligned has executed. { void *mem; void *ptr; // answer a) here memset_16aligned(ptr, 0, 1024); // answer b) here }
Original answer { void *mem = malloc(1024+16); void *ptr = ((char *)mem+16) & ~ 0x0F; memset_16aligned(ptr, 0, 1024); free(mem); } Fixed answer { void *mem = malloc(1024+15); void *ptr = ((uintptr_t)mem+15) & ~ (uintptr_t)0x0F; memset_16aligned(ptr, 0, 1024); free(mem); } Explanation as requested The first step is to allocate enough spare space, just in case. Since the memory must be 16-byte aligned (meaning that the leading byte address needs to be a multiple of 16), adding 16 extra bytes guarantees that we have enough space. Somewhere in the first 16 bytes, there is a 16-byte aligned pointer. (Note that malloc() is supposed to return a pointer that is sufficiently well aligned for any purpose. However, the meaning of 'any' is primarily for things like basic types — long, double, long double, long long, and pointers to objects and pointers to functions. When you are doing more specialized things, like playing with graphics systems, they can need more stringent alignment than the rest of the system — hence questions and answers like this.) The next step is to convert the void pointer to a char pointer; GCC notwithstanding, you are not supposed to do pointer arithmetic on void pointers (and GCC has warning options to tell you when you abuse it). Then add 16 to the start pointer. Suppose malloc() returned you an impossibly badly aligned pointer: 0x800001. Adding the 16 gives 0x800011. Now I want to round down to the 16-byte boundary — so I want to reset the last 4 bits to 0. 0x0F has the last 4 bits set to one; therefore, ~0x0F has all bits set to one except the last four. Anding that with 0x800011 gives 0x800010. You can iterate over the other offsets and see that the same arithmetic works. The last step, free(), is easy: you always, and only, return to free() a value that one of malloc(), calloc() or realloc() returned to you — anything else is a disaster. You correctly provided mem to hold that value — thank you. The free releases it. Finally, if you know about the internals of your system's malloc package, you could guess that it might well return 16-byte aligned data (or it might be 8-byte aligned). If it was 16-byte aligned, then you'd not need to dink with the values. However, this is dodgy and non-portable — other malloc packages have different minimum alignments, and therefore assuming one thing when it does something different would lead to core dumps. Within broad limits, this solution is portable. Someone else mentioned posix_memalign() as another way to get the aligned memory; that isn't available everywhere, but could often be implemented using this as a basis. Note that it was convenient that the alignment was a power of 2; other alignments are messier. One more comment — this code does not check that the allocation succeeded. Amendment Windows Programmer pointed out that you can't do bit mask operations on pointers, and, indeed, GCC (3.4.6 and 4.3.1 tested) does complain like that. So, an amended version of the basic code — converted into a main program, follows. I've also taken the liberty of adding just 15 instead of 16, as has been pointed out. I'm using uintptr_t since C99 has been around long enough to be accessible on most platforms. If it wasn't for the use of PRIXPTR in the printf() statements, it would be sufficient to #include <stdint.h> instead of using #include <inttypes.h>. [This code includes the fix pointed out by C.R., which was reiterating a point first made by Bill K a number of years ago, which I managed to overlook until now.] #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static void memset_16aligned(void *space, char byte, size_t nbytes) { assert((nbytes & 0x0F) == 0); assert(((uintptr_t)space & 0x0F) == 0); memset(space, byte, nbytes); // Not a custom implementation of memset() } int main(void) { void *mem = malloc(1024+15); void *ptr = (void *)(((uintptr_t)mem+15) & ~ (uintptr_t)0x0F); printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr); memset_16aligned(ptr, 0, 1024); free(mem); return(0); } And here is a marginally more generalized version, which will work for sizes which are a power of 2: #include <assert.h> #include <inttypes.h> #include <stdio.h> #include <stdlib.h> #include <string.h> static void memset_16aligned(void *space, char byte, size_t nbytes) { assert((nbytes & 0x0F) == 0); assert(((uintptr_t)space & 0x0F) == 0); memset(space, byte, nbytes); // Not a custom implementation of memset() } static void test_mask(size_t align) { uintptr_t mask = ~(uintptr_t)(align - 1); void *mem = malloc(1024+align-1); void *ptr = (void *)(((uintptr_t)mem+align-1) & mask); assert((align & (align - 1)) == 0); printf("0x%08" PRIXPTR ", 0x%08" PRIXPTR "\n", (uintptr_t)mem, (uintptr_t)ptr); memset_16aligned(ptr, 0, 1024); free(mem); } int main(void) { test_mask(16); test_mask(32); test_mask(64); test_mask(128); return(0); } To convert test_mask() into a general purpose allocation function, the single return value from the allocator would have to encode the release address, as several people have indicated in their answers. Problems with interviewers Uri commented: Maybe I am having [a] reading comprehension problem this morning, but if the interview question specifically says: "How would you allocate 1024 bytes of memory" and you clearly allocate more than that. Wouldn't that be an automatic failure from the interviewer? My response won't fit into a 300-character comment... It depends, I suppose. I think most people (including me) took the question to mean "How would you allocate a space in which 1024 bytes of data can be stored, and where the base address is a multiple of 16 bytes". If the interviewer really meant how can you allocate 1024 bytes (only) and have it 16-byte aligned, then the options are more limited. Clearly, one possibility is to allocate 1024 bytes and then give that address the 'alignment treatment'; the problem with that approach is that the actual available space is not properly determinate (the usable space is between 1008 and 1024 bytes, but there wasn't a mechanism available to specify which size), which renders it less than useful. Another possibility is that you are expected to write a full memory allocator and ensure that the 1024-byte block you return is appropriately aligned. If that is the case, you probably end up doing an operation fairly similar to what the proposed solution did, but you hide it inside the allocator. However, if the interviewer expected either of those responses, I'd expect them to recognize that this solution answers a closely related question, and then to reframe their question to point the conversation in the correct direction. (Further, if the interviewer got really stroppy, then I wouldn't want the job; if the answer to an insufficiently precise requirement is shot down in flames without correction, then the interviewer is not someone for whom it is safe to work.) The world moves on The title of the question has changed recently. It was Solve the memory alignment in C interview question that stumped me. The revised title (How to allocate aligned memory only using the standard library?) demands a slightly revised answer — this addendum provides it. C11 (ISO/IEC 9899:2011) added function aligned_alloc(): 7.22.3.1 The aligned_alloc function Synopsis #include <stdlib.h> void *aligned_alloc(size_t alignment, size_t size); Description The aligned_alloc function allocates space for an object whose alignment is specified by alignment, whose size is specified by size, and whose value is indeterminate. The value of alignment shall be a valid alignment supported by the implementation and the value of size shall be an integral multiple of alignment. Returns The aligned_alloc function returns either a null pointer or a pointer to the allocated space. And POSIX defines posix_memalign(): #include <stdlib.h> int posix_memalign(void **memptr, size_t alignment, size_t size); DESCRIPTION The posix_memalign() function shall allocate size bytes aligned on a boundary specified by alignment, and shall return a pointer to the allocated memory in memptr. The value of alignment shall be a power of two multiple of sizeof(void *). Upon successful completion, the value pointed to by memptr shall be a multiple of alignment. If the size of the space requested is 0, the behavior is implementation-defined; the value returned in memptr shall be either a null pointer or a unique pointer. The free() function shall deallocate memory that has previously been allocated by posix_memalign(). RETURN VALUE Upon successful completion, posix_memalign() shall return zero; otherwise, an error number shall be returned to indicate the error. Either or both of these could be used to answer the question now, but only the POSIX function was an option when the question was originally answered. Behind the scenes, the new aligned memory function do much the same job as outlined in the question, except they have the ability to force the alignment more easily, and keep track of the start of the aligned memory internally so that the code doesn't have to deal with specially — it just frees the memory returned by the allocation function that was used.
Three slightly different answers depending how you look at the question: 1) Good enough for the exact question asked is Jonathan Leffler's solution, except that to round up to 16-aligned, you only need 15 extra bytes, not 16. A: /* allocate a buffer with room to add 0-15 bytes to ensure 16-alignment */ void *mem = malloc(1024+15); ASSERT(mem); // some kind of error-handling code /* round up to multiple of 16: add 15 and then round down by masking */ void *ptr = ((char*)mem+15) & ~ (size_t)0x0F; B: free(mem); 2) For a more generic memory allocation function, the caller doesn't want to have to keep track of two pointers (one to use and one to free). So you store a pointer to the 'real' buffer below the aligned buffer. A: void *mem = malloc(1024+15+sizeof(void*)); if (!mem) return mem; void *ptr = ((char*)mem+sizeof(void*)+15) & ~ (size_t)0x0F; ((void**)ptr)[-1] = mem; return ptr; B: if (ptr) free(((void**)ptr)[-1]); Note that unlike (1), where only 15 bytes were added to mem, this code could actually reduce the alignment if your implementation happens to guarantee 32-byte alignment from malloc (unlikely, but in theory a C implementation could have a 32-byte aligned type). That doesn't matter if all you do is call memset_16aligned, but if you use the memory for a struct then it could matter. I'm not sure off-hand what a good fix is for this (other than to warn the user that the buffer returned is not necessarily suitable for arbitrary structs) since there's no way to determine programatically what the implementation-specific alignment guarantee is. I guess at startup you could allocate two or more 1-byte buffers, and assume that the worst alignment you see is the guaranteed alignment. If you're wrong, you waste memory. Anyone with a better idea, please say so... [Added: The 'standard' trick is to create a union of 'likely to be maximally aligned types' to determine the requisite alignment. The maximally aligned types are likely to be (in C99) 'long long', 'long double', 'void *', or 'void (*)(void)'; if you include <stdint.h>, you could presumably use 'intmax_t' in place of long long (and, on Power 6 (AIX) machines, intmax_t would give you a 128-bit integer type). The alignment requirements for that union can be determined by embedding it into a struct with a single char followed by the union: struct alignment { char c; union { intmax_t imax; long double ldbl; void *vptr; void (*fptr)(void); } u; } align_data; size_t align = (char *)&align_data.u.imax - &align_data.c; You would then use the larger of the requested alignment (in the example, 16) and the align value calculated above. On (64-bit) Solaris 10, it appears that the basic alignment for the result from malloc() is a multiple of 32 bytes. ] In practice, aligned allocators often take a parameter for the alignment rather than it being hardwired. So the user will pass in the size of the struct they care about (or the least power of 2 greater than or equal to that) and all will be well. 3) Use what your platform provides: posix_memalign for POSIX, _aligned_malloc on Windows. 4) If you use C11, then the cleanest - portable and concise - option is to use the standard library function aligned_alloc that was introduced in this version of the language specification.
You could also try posix_memalign() (on POSIX platforms, of course).
Here's an alternate approach to the 'round up' part. Not the most brilliantly coded solution but it gets the job done, and this type of syntax is a bit easier to remember (plus would work for alignment values that aren't a power of 2). The uintptr_t cast was necessary to appease the compiler; pointer arithmetic isn't very fond of division or multiplication. void *mem = malloc(1024 + 15); void *ptr = (void*) ((uintptr_t) mem + 15) / 16 * 16; memset_16aligned(ptr, 0, 1024); free(mem);
Unfortunately, in C99 it seems pretty tough to guarantee alignment of any sort in a way which would be portable across any C implementation conforming to C99. Why? Because a pointer is not guaranteed to be the "byte address" one might imagine with a flat memory model. Neither is the representation of uintptr_t so guaranteed, which itself is an optional type anyway. We might know of some implementations which use a representation for void * (and by definition, also char *) which is a simple byte address, but by C99 it is opaque to us, the programmers. An implementation might represent a pointer by a set {segment, offset} where offset could have who-knows-what alignment "in reality." Why, a pointer could even be some form of hash table lookup value, or even a linked-list lookup value. It could encode bounds information. In a recent C1X draft for a C Standard, we see the _Alignas keyword. That might help a bit. The only guarantee C99 gives us is that the memory allocation functions will return a pointer suitable for assignment to a pointer pointing at any object type. Since we cannot specify the alignment of objects, we cannot implement our own allocation functions with responsibility for alignment in a well-defined, portable manner. It would be good to be wrong about this claim.
On the 16 vs 15 byte-count padding front, the actual number you need to add to get an alignment of N is max(0,N-M) where M is the natural alignment of the memory allocator (and both are powers of 2). Since the minimal memory alignment of any allocator is 1 byte, 15=max(0,16-1) is a conservative answer. However, if you know your memory allocator is going to give you 32-bit int aligned addresses (which is fairly common), you could have used 12 as a pad. This isn't important for this example but it might be important on an embedded system with 12K of RAM where every single int saved counts. The best way to implement it if you're actually going to try to save every byte possible is as a macro so you can feed it your native memory alignment. Again, this is probably only useful for embedded systems where you need to save every byte. In the example below, on most systems, the value 1 is just fine for MEMORY_ALLOCATOR_NATIVE_ALIGNMENT, however for our theoretical embedded system with 32-bit aligned allocations, the following could save a tiny bit of precious memory: #define MEMORY_ALLOCATOR_NATIVE_ALIGNMENT 4 #define ALIGN_PAD2(N,M) (((N)>(M)) ? ((N)-(M)) : 0) #define ALIGN_PAD(N) ALIGN_PAD2((N), MEMORY_ALLOCATOR_NATIVE_ALIGNMENT)
Perhaps they would have been satisfied with a knowledge of memalign? And as Jonathan Leffler points out, there are two newer preferable functions to know about. Oops, florin beat me to it. However, if you read the man page I linked to, you'll most likely understand the example supplied by an earlier poster.
We do this sort of thing all the time for Accelerate.framework, a heavily vectorized OS X / iOS library, where we have to pay attention to alignment all the time. There are quite a few options, one or two of which I didn't see mentioned above. The fastest method for a small array like this is just stick it on the stack. With GCC / clang: void my_func( void ) { uint8_t array[1024] __attribute__ ((aligned(16))); ... } No free() required. This is typically two instructions: subtract 1024 from the stack pointer, then AND the stack pointer with -alignment. Presumably the requester needed the data on the heap because its lifespan of the array exceeded the stack or recursion is at work or stack space is at a serious premium. On OS X / iOS all calls to malloc/calloc/etc. are always 16 byte aligned. If you needed 32 byte aligned for AVX, for example, then you can use posix_memalign: void *buf = NULL; int err = posix_memalign( &buf, 32 /*alignment*/, 1024 /*size*/); if( err ) RunInCirclesWaivingArmsWildly(); ... free(buf); Some folks have mentioned the C++ interface that works similarly. It should not be forgotten that pages are aligned to large powers of two, so page-aligned buffers are also 16 byte aligned. Thus, mmap() and valloc() and other similar interfaces are also options. mmap() has the advantage that the buffer can be allocated preinitialized with something non-zero in it, if you want. Since these have page aligned size, you will not get the minimum allocation from these, and it will likely be subject to a VM fault the first time you touch it. Cheesy: Turn on guard malloc or similar. Buffers that are n*16 bytes in size such as this one will be n*16 bytes aligned, because VM is used to catch overruns and its boundaries are at page boundaries. Some Accelerate.framework functions take in a user supplied temp buffer to use as scratch space. Here we have to assume that the buffer passed to us is wildly misaligned and the user is actively trying to make our life hard out of spite. (Our test cases stick a guard page right before and after the temp buffer to underline the spite.) Here, we return the minimum size we need to guarantee a 16-byte aligned segment somewhere in it, and then manually align the buffer afterward. This size is desired_size + alignment - 1. So, In this case that is 1024 + 16 - 1 = 1039 bytes. Then align as so: #include <stdint.h> void My_func( uint8_t *tempBuf, ... ) { uint8_t *alignedBuf = (uint8_t*) (((uintptr_t) tempBuf + ((uintptr_t)alignment-1)) & -((uintptr_t) alignment)); ... } Adding alignment-1 will move the pointer past the first aligned address and then ANDing with -alignment (e.g. 0xfff...ff0 for alignment=16) brings it back to the aligned address. As described by other posts, on other operating systems without 16-byte alignment guarantees, you can call malloc with the larger size, set aside the pointer for free() later, then align as described immediately above and use the aligned pointer, much as described for our temp buffer case. As for aligned_memset, this is rather silly. You only have to loop in up to 15 bytes to reach an aligned address, and then proceed with aligned stores after that with some possible cleanup code at the end. You can even do the cleanup bits in vector code, either as unaligned stores that overlap the aligned region (providing the length is at least the length of a vector) or using something like movmaskdqu. Someone is just being lazy. However, it is probably a reasonable interview question if the interviewer wants to know whether you are comfortable with stdint.h, bitwise operators and memory fundamentals, so the contrived example can be forgiven.
I'm surprised noone's voted up Shao's answer that, as I understand it, it is impossible to do what's asked in standard C99, since converting a pointer to an integral type formally is undefined behavior. (Apart from the standard allowing conversion of uintptr_t <-> void*, but the standard does not seem to allow doing any manipulations of the uintptr_t value and then converting it back.)
usage of memalign, Aligned-Memory-Blocks might be a good solution for the problem.
The first thing that popped into my head when reading this question was to define an aligned struct, instantiate it, and then point to it. Is there a fundamental reason I'm missing since no one else suggested this? As a sidenote, since I used an array of char (assuming the system's char is 8 bits (i.e. 1 byte)), I don't see the need for the __attribute__((packed)) necessarily (correct me if I'm wrong), but I put it in anyway. This works on two systems I tried it on, but it's possible that there is a compiler optimization that I'm unaware of giving me false positives vis-a-vis the efficacy of the code. I used gcc 4.9.2 on OSX and gcc 5.2.1 on Ubuntu. #include <stdio.h> #include <stdlib.h> int main () { void *mem; void *ptr; // answer a) here struct __attribute__((packed)) s_CozyMem { char acSpace[16]; }; mem = malloc(sizeof(struct s_CozyMem)); ptr = mem; // memset_16aligned(ptr, 0, 1024); // Check if it's aligned if(((unsigned long)ptr & 15) == 0) printf("Aligned to 16 bytes.\n"); else printf("Rubbish.\n"); // answer b) here free(mem); return 1; }
MacOS X specific: All pointers allocated with malloc are 16 bytes aligned. C11 is supported, so you can just call aligned_malloc (16, size). MacOS X picks code that is optimised for individual processors at boot time for memset, memcpy and memmove and that code uses tricks that you've never heard of to make it fast. 99% chance that memset runs faster than any hand-written memset16 which makes the whole question pointless. If you want a 100% portable solution, before C11 there is none. Because there is no portable way to test alignment of a pointer. If it doesn't have to be 100% portable, you can use char* p = malloc (size + 15); p += (- (unsigned int) p) % 16; This assumes that the alignment of a pointer is stored in the lowest bits when converting a pointer to unsigned int. Converting to unsigned int loses information and is implementation defined, but that doesn't matter because we don't convert the result back to a pointer. The horrible part is of course that the original pointer must be saved somewhere to call free () with it. So all in all I would really doubt the wisdom of this design.
You can also add some 16 bytes and then push the original ptr to 16bit aligned by adding the (16-mod) as below the pointer : main(){ void *mem1 = malloc(1024+16); void *mem = ((char*)mem1)+1; // force misalign ( my computer always aligns) printf ( " ptr = %p \n ", mem ); void *ptr = ((long)mem+16) & ~ 0x0F; printf ( " aligned ptr = %p \n ", ptr ); printf (" ptr after adding diff mod %p (same as above ) ", (long)mem1 + (16 -((long)mem1%16)) ); free(mem1); }
If there are constraints that, you cannot waste a single byte, then this solution works: Note: There is a case where this may be executed infinitely :D void *mem; void *ptr; try: mem = malloc(1024); if (mem % 16 != 0) { free(mem); goto try; } ptr = mem; memset_16aligned(ptr, 0, 1024);
For the solution i used a concept of padding which aligns the memory and do not waste the memory of a single byte . If there are constraints that, you cannot waste a single byte. All pointers allocated with malloc are 16 bytes aligned. C11 is supported, so you can just call aligned_alloc (16, size). void *mem = malloc(1024+16); void *ptr = ((char *)mem+16) & ~ 0x0F; memset_16aligned(ptr, 0, 1024); free(mem);
size =1024; alignment = 16; aligned_size = size +(alignment -(size % alignment)); mem = malloc(aligned_size); memset_16aligned(mem, 0, 1024); free(mem); Hope this one is the simplest implementation, let me know your comments.
long add; mem = (void*)malloc(1024 +15); add = (long)mem; add = add - (add % 16);//align to 16 byte boundary ptr = (whatever*)(add);