Full Page Malloc - c

I am trying to optimize the memory allocation of my program by using entire pages at a time.
I am grabbing the page size like this: sysconf(_SC_PAGESIZE); then calculating the total number of elements that will fit in a page like this: elements=pageSize/sizeof(Node);
I was thinking that when I actually go to malloc my memory I would use malloc(elements*sizeof(Node)); It seems like the multiplication and division of sifeof(Node) would cancel out, but with integer division, I do not believe that that is the case.
Is this the best way to malloc an entire page at a time?
Thanks

The malloc function doesn't have any concept of pagesize. Unless you are allocating pages that are ALSO aligned to a page-boundary, you will not get ANY benefit from calling malloc in this way. Just malloc as many elements as you need, and stop worrying about micro-optimising something that almost certainly won't give you any benefit at all.
Yes, the Linux kernel does things like this all the time. There are two reasons for that:
You don't want to allocate blocks LARGER than a page, since that significantly increases the risk of allocation failure.
The kernel allocation is made on a per-page basis, rather than like the C library, which allocates a large amount of memory in one go, and then splits it into small components.
If you really want to allocate page-size amount of memory, then use the result from sysconf(_SC_PAGESIZE) as your size argument. But it is almost certain that your allocation straddles two pages.

Your computation elements=pageSize/sizeof(Node); doesn't take account of the malloc() metadata that are added to any block/chunk of memory returned by malloc(). In many cases, malloc() will return a memory block likely aligned at least on min(sizeof(double),2 * sizeof(void *)) boundary (32 bytes is becoming quite common btw ...). If malloc() gets a memory block aligned on a page, adds its chunk (with padding), and you write a full page size of data, the last bytes are off the first page: so you're ending up using 2 pages.
Want a whole page, just for you, without concerns about wasting memory, without using mmap() / VirtualAlloc() as suggested in the comments ?
Here you are:
int ret;
void *ptr = NULL;
size_t page_size = sysconf(_SC_PAGESIZE);
ret = posix_memalign(&ptr, page_size, page_size);
if (ret != 0 || ptr == NULL) {
fprintf(stderr, "posix_memalign failed: %s\n", strerror(ret));
}
By the way, this is probably about micro-optimization.
You probably still haven't checked your Node have a size multiple of a cache-line, nor how to improve cache-locality, nor found a way to reduce memory fragmentation. So you're probably going in the wrong way: make it works first, profil, optimize your algorithms, profil, micro-optimize at the last option.

The C11 standard added the aligned_alloc call, so you can do something like:
#include <stdlib.h>
#include <unistd.h>
void *alloc_page( void )
{
long page_size = sysconf( _SC_PAGESIZE ); /* arguably could be a constant, #define, etc. */
return ( page_size > 0 ? aligned_alloc( page_size, page_size ) : NULL );
}
The problem with this approach, as others pointed out, is that typically the implementation of the standard alloc calls add some bookkeeping overhead that is stored just before the allocated memory. So, this allocation will usually straddle two pages: the returned page for you to use, and the very end of another page used by the allocator's bookkeeping.
That means when you free or realloc this memory, it may need to touch two pages rather than just the one. Also, if you allocate all or most of your memory this way, then you can "waste" a lot of virtual memory as roughly half of the pages allocated to your process at the OS level will only be used a tiny bit for the allocator's bookkeeping.
How important these issues are is hard to say generally, but preferably they would be avoided somehow. Unfortunately, I haven't figured out a clean, easy, and portable way to do that yet.
==============================
Addendum: If you could dynamically figure out malloc's memory overhead and assume it is always constant, then would asking for that much less usually give us what we want?
#include <stdlib.h>
#include <unistd.h>
/* decent default guesses (e.g. - Linux x64) */
static size_t Page_Size = 4096;
static size_t Malloc_Overhead = 32;
/* call once at beginning of program (i.e. - single thread, no allocs yet) */
int alloc_page_init( void )
{
int ret = -1;
long page_size = sysconf( _SC_PAGESIZE );
char *p1 = malloc( 1 );
char *p2 = malloc( 1 );
size_t malloc_overhead;
if ( page_size <= 0 || p1 == NULL || p2 == NULL )
goto FAIL;
malloc_overhead = ( size_t ) ( p2 > p1 ? p2 - p1 : p1 - p2 ); /* non-standard pointer math */
if ( malloc_overhead > 64 || malloc_overhead >= page_size )
goto FAIL;
Page_Size = page_size;
Malloc_Overhead = malloc_overhead;
ret = 0;
FAIL:
if ( p1 )
free( p1 );
if ( p2 )
free( p2 );
return ret;
}
void *alloc_page( void )
{
return aligned_alloc( Page_Size - Malloc_Overhead, Page_Size - Malloc_Overhead );
}
Answer: probably not, because, for example, "As an example of the "supported by the implementation" requirement, POSIX function posix_memalign accepts any alignment that is a power of two and a multiple of sizeof(void *), and POSIX-based implementations of aligned_alloc inherit these requirements."
The above code would likely not request an alignment that is a power of 2 and will therefore likely fail on most platforms.
It seems this is an unavoidable problem with the typical implementations of standard allocation functions. So, it is probably best to just align and alloc based on the page size and likely pay the penalty of the allocator's bookkeeping residing on another page, or use an OS specific call like mmap to avoid this issue.

The standards provide no guarantee that malloc even has a concept of page size. However, it's not uncommon for malloc implementations to dole out entire pages when the allocation size requested is on the order of the page size (or larger).
There's certainly no harm in asking for an allocation that happens to be equal to the page size (or a multiple of the page size) and subdividing it yourself, though it is a little extra work. You might indeed get the behavior you desire, at least on some machines/compiler/library combinations. But you might not either. If you absolutely require page-sized allocations and/or page-aligned memory, you'll have to call an OS-specific API to get it.

If your question is about how to alloc whole memory pages: Use mmap(), not malloc().
Reason:
malloc() must always add some metadata to every allocation, so if you do malloc(4096) it will definitely allocate more than a single page. mmap(), on the other hand, is the kernel's API to map pages into your address space. It's what malloc() uses under the hood.
If your question is about correct rounding: The usual trick to round a up to a multiple of N is to say rounded = (a + N-1)/N*N;. By adding N-1 first, you ensure that the division will round up in all cases. In the case that a is already a multiple of N, the added N-1 will be without effect; in all other cases, you get one more than with rounded = a/N*N;.

Related

Is this implementation of malloc a bump allocator?

I recently wrote a small malloc and was wondering if it was a bump allocator. I wonder this because (correct me if I am wrong) I believe the actual malloc (while using mmap instead of sbrk) uses the same technique (sort of), but a bump allocator just increments the heap location. Here is my code:
#include <cstdio>
#include <cstddef>
#include <unistd.h>
#define word_size sizeof(intptr_t)
#define align(n) ((n + word_size - 1) & ~(word_size - 1))
void* my_malloc(size_t size) {
void* p = sbrk(0);
if (sbrk(align(size)) == (void*) -1)
return NULL; // failed
return p;
}
int main() {
int* foo = (int*) my_malloc(1);
*foo = 100;
printf("%d\n", *foo);
}
So I had never heard the term "bump allocator" before, but the concept is straightforward.
It's a really naive allocator that can be very fast due to the tiny amount of housekeeping involved, but you have to live with a pretty heavy constraint: there's no "free" operation for individual request - you just destroy the whole thing.
In your case, you're calling sbrk(0) to get the first address at the end of the program's whole data segment - this will be the return value - then "bump" the end of memory with sbrk(nbytes) after suitably rounding it up.
This means that the program's data space just grows up for each request, and trying to free something doesn't make any sense because you can't just put a hole in the address space (well, there's probably some funky VM stuff that would work, but that gets complicated).
static void *bump_arena_start = 0;
void* my_malloc(size_t size) {
void* p = sbrk(0);
if (bump_arena_start == 0) bump_arena_start = p;
if (sbrk(align(size)) == (void*) -1)
return NULL; // failed
return p;
}
void destroy_bump_arena(void)
{
if (bump_arena_start) brk(bump_arena_start);
}
This would probably be a bump allocator, but it would be a terrible one for a bunch of reasons.
First: it assumes that nobody else is allocating memory, it would have to override all other operations: malloc, C++ new, everything in the C runtime, etc.
Imagine what would happen if you're doing your own thing with the break, and then some other function calls sbrk() to allocate some memory. Now they are in the middle of your arena but you mostly don't know it. So far no problem, but as soon as you go to destroy your arena, it kills everything else.
The way you'd actually use such a thing is when you have a lot of tiny allocations that you don't want to keep track of and can release all at once, so you might use the system allocator (malloc()) and ask for a large-enough chunk to handle your needs - let's call it 32kbytes - and stuff this into some object representing this bump arena.
You allocate lots of little bits here and there, do whatever task you need to do, then destroy all of it by freeing that initial 32-kbyte chunk.
The thing is: you have to be super careful that you don't let one of these pointers escape to other parts of the system, because they aren't allowed to live beyond the lifetime of your arena.
This is just a really specialized use case that's probably not generally useful, and unless you're doing embedded work (where you are essentially controlling your own runtime), you couldn't really do one with the system break this way.
Side note: you can get into trouble with alignment if you have objects larger than the size of an integer pointer. What if you did this?
int *foo1 = my_malloc(sizeof(int)); // 8 bytes (usually)
__int128 *foo2 = my_malloc(sizeof(__int128)); // 16 bytes
The naive alignment would put the int on an 8-byte boundary, but so would the 128-bit value (which is 16 bytes), not aligned to its own size; on some platforms that's probably an error, and it's almost always inefficient.
To do it right you'd query the current next-value via sbrk(0) and insure it was aligned properly for the size, possibly bumping up the break a bit.
EDIT: I have looked into this a bit more, and it's pretty clear that your example doesn't quite count as a bump allocator. Here's why.
The memory system not only keeps track of the "next" pointer, but how many allocations have been performed, and it supports a pseudo-free operation that ignores the pointer value but just decrements the allocation counter.
If the allocation counter ever reaches zero, this means nobody else has any of that memory, so it can free everything by rewinding the break to the initial value, essentially starting over with a clean slate.
To use this properly you'd have to really be careful about your deallocations, and a double-free could be very painful.
Really useful reference: https://os.phil-opp.com/allocator-designs/
EDIT2 - a bit more about alignment per request.
You have to have at least some awareness of alignment no matter what platform you're on, because even if the platform allows unaligned access to memory, it's almost always slower.
The super easy way to always get it right is to figure out the largest possible scalar object supported on the platform, and use that as your alignment modulo, perhaps __int128. If you're always rounding up to the nearest 16 bytes, you'll pretty much never run into an alignment issue (plus it's easy).
But it's also space-inefficient: if you're allocating space for a two-byte short, it will waste the 14 bytes after it. That might be no big deal in your application, or it might be a thing.
I have never written a memory allocator myself, so I'm doing a lot of handwaving here, but anybody using a bump allocator has some specialized requirements and is probably OK with making specialized requests.
So: I could imagine an allocator that takes not just the size, but also the alignment required, and it would take the sbrk(0) pointer and round that up the required alignment, save that as the return value, then call sbrk(size) to bump the end marker.
Note that you're not aligning to the size of the allocation, but only the size of the low-level item: asking for an array of 20 short values means you're asking for 40 bytes but with an alignment of 2, and 100 bytes for a string means you just take the next 100 bytes w/o any alignment.
void *my_malloc(size_t nbytes, size_t align = 8)
{
void *p = sbrk(0);
p += round up but too hard to think on Friday
sbrk(nbytes);
num_allocations++;
return p;
}
This way, if you don't give an alignment size, it makes the same safe assumption you did, but you could always ask if you wanted to be special.
Again: I'm mostly just making this up, I've never had to think about it this way, but I do know that if I'm working on a memory-constrained platform such as an Arduino with RAM measured in kilobytes, you have to think about this.

Can realloc fail when shrinking buffer? [duplicate]

If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?
Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.
The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.
The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.
Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.
I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).
A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).
realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.

How much memory calloc and malloc can allocate?

How much memory calloc and malloc can allocate?
As malloc and calloc can allocate memory dynamically
Example
void *malloc (size_in_bytes);
And calloc can allocate memory depending on the number of blocks
Example
void *calloc (number_of_blocks, size_of_each_block_in_bytes);
You can allocate as much bytes as type size_t has different values. So in 32-bit application it is 4GB in 64-bit 16 I don't even know how to call that size
All in all you can allocate all memory of machine.
Aside from being limited by the amount RAM in the PC, it is system dependent, but on Windows, it's _HEAP_MAXREQ according to the MSDN article on malloc. Note though malloc and calloc are not guaranteed to allocate anything. It all depends on how much memory is available on the executing PC.
malloc sets errno to ENOMEM if a memory allocation fails or if the amount of memory requested exceeds _HEAP_MAXREQ.
_HEAP_MAXREQ is defined as follows in malloc.h (at least in the Visual Studio 2010 includes).
#ifdef _WIN64
#define _HEAP_MAXREQ 0xFFFFFFFFFFFFFFE0
#else
#define _HEAP_MAXREQ 0xFFFFFFE0
#endif
You shouldn't really worry about this though. When using malloc you should decide how much memory you really need, and call it with that as the request. If the system cannot provide it, malloc will return NULL. After you make your call to malloc you should always check to see that it's not NULL. Here is the C example from the MSDN article on proper usage. Also note that once you've finished with the memory, you need to call free.
#include <stdlib.h> // For _MAX_PATH definition
#include <stdio.h>
#include <malloc.h>
int main( void )
{
char *string;
// Allocate space for a path name
string = malloc( _MAX_PATH );
// In a C++ file, explicitly cast malloc's return. For example,
// string = (char *)malloc( _MAX_PATH );
if( string == NULL )
printf( "Insufficient memory available\n" );
else
{
printf( "Memory space allocated for path name\n" );
free( string );
printf( "Memory freed\n" );
}
}
As far as the language definition is concerned, the only upper limit per call is what size_t will support (e.g., if the max size_t value is 232-1, then that's the largest number of bytes malloc can allocate for a single block).
Whether you have the resources available for such a call to succeed depends on the implementation and the underyling system.
On linux, indefinite amounts, i. e. much more than you have either physical or virtual memory.
As long, as you don't actually use it, you're fine, it's just when you actually use more memory than available that the out-of-memory killer starts running amok, and shoots down your process.
This is the reason why many people don't bother checking the result of malloc() anymore, because it's not null even when the returned buffer can never be backed...
1) It depends on the resource limits of the user.
2) It also depends on the availability of address space.

Can realloc fail (return NULL) when trimming?

If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?
Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.
The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.
The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.
Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.
I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).
A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).
realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.

Any function to query the size of an allocated block?

I realize that any such function is likely to be non standard, but that's ok for my use case. Basically, I need a method (even if it's only exposed through glibc's syscall() interface) that I can pass a pointer to (a pointer that was returned by a previous call to malloc()) that returns the size of the block the pointer points at. Does such a thing exist?
So far as I know, there is no such function at least in C90/C99. Some systems, mostly libc, provide functions to get allocated size (e.g. malloc_size() in Mac OS X), but this is not portable and probably you should avoid using that at best.
There is no need to use non-standard functions, it is not hard to write your own allocator with the necessary functionality.
You have to know the size when you allocate the block, so simply retain that information. There are few situations IMO when you would not know that information, since by definition you knew it when it was allocated. However if you need such functionality, you can do that simply by wrapping malloc(), and pre-pending the size to the block.
void* smalloc( size_t size )
{
// allocate block with additional space for size
void* blk = malloc( size + sizeof(size_t) ) ;
// set the size
*((size_t*)blk) = size ;
// return pointer to block after size field (user block)
return ((size_t*)blk) + 1 ;
}
void sfree( const void* blk )
{
// Free from the size field address, not the user block
free( ((const size_t*)blk) - 1 ) ;
}
size_t ssize( const void* blk )
{
// Size is immediately before user block
return *(((size_t*)blk) - 1) ;
}
On Jim Buck's point; On some target's, some jiggering may be needed to preserve necessary alignment. Some targets will generate less efficient code if alignment is not optimal, others will cause an abort. So beware of this solution. Personally I am wary of teh need for this solution!
One solution perhaps would be to use a data structure such as a hash table with the malloc address as the key, and the size as the content, and using the same wrapper technique store the size separately from the block - at the expense or performance, additional storage, and perhaps some finite limit on capacity in terms of number of blocks that can be managed.
However you do it, the fundamental point remains valid - wrap the basic service to provide what you need.
_msize on Windows platforms.

Resources