Any function to query the size of an allocated block? - c

I realize that any such function is likely to be non standard, but that's ok for my use case. Basically, I need a method (even if it's only exposed through glibc's syscall() interface) that I can pass a pointer to (a pointer that was returned by a previous call to malloc()) that returns the size of the block the pointer points at. Does such a thing exist?

So far as I know, there is no such function at least in C90/C99. Some systems, mostly libc, provide functions to get allocated size (e.g. malloc_size() in Mac OS X), but this is not portable and probably you should avoid using that at best.

There is no need to use non-standard functions, it is not hard to write your own allocator with the necessary functionality.
You have to know the size when you allocate the block, so simply retain that information. There are few situations IMO when you would not know that information, since by definition you knew it when it was allocated. However if you need such functionality, you can do that simply by wrapping malloc(), and pre-pending the size to the block.
void* smalloc( size_t size )
{
// allocate block with additional space for size
void* blk = malloc( size + sizeof(size_t) ) ;
// set the size
*((size_t*)blk) = size ;
// return pointer to block after size field (user block)
return ((size_t*)blk) + 1 ;
}
void sfree( const void* blk )
{
// Free from the size field address, not the user block
free( ((const size_t*)blk) - 1 ) ;
}
size_t ssize( const void* blk )
{
// Size is immediately before user block
return *(((size_t*)blk) - 1) ;
}
On Jim Buck's point; On some target's, some jiggering may be needed to preserve necessary alignment. Some targets will generate less efficient code if alignment is not optimal, others will cause an abort. So beware of this solution. Personally I am wary of teh need for this solution!
One solution perhaps would be to use a data structure such as a hash table with the malloc address as the key, and the size as the content, and using the same wrapper technique store the size separately from the block - at the expense or performance, additional storage, and perhaps some finite limit on capacity in terms of number of blocks that can be managed.
However you do it, the fundamental point remains valid - wrap the basic service to provide what you need.

_msize on Windows platforms.

Related

Can realloc fail when shrinking buffer? [duplicate]

If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?
Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.
The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.
The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.
Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.
I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).
A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).
realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.

What alignment issues limit the use of a block of memory created by malloc?

I am writing a library for various mathematical computations in C. Several of these need some "scratch" space -- memory that is used for intermediate calculations. The space required depends on the size of the inputs, so it cannot be statically allocated. The library will typically be used to perform many iterations of the same type of calculation with the same size inputs, so I'd prefer not to malloc and free inside the library for each call; it would be much more efficient to allocate a large enough block once, re-use it for all the calculations, then free it.
My intended strategy is to request a void pointer to a single block of memory, perhaps with an accompanying allocation function. Say, something like this:
void *allocateScratch(size_t rows, size_t columns);
void doCalculation(size_t rows, size_t columns, double *data, void *scratch);
The idea is that if the user intends to do several calculations of the same size, he may use the allocate function to grab a block that is large enough, then use that same block of memory to perform the calculation for each of the inputs. The allocate function is not strictly necessary, but it simplifies the interface and makes it easier to change the storage requirements in the future, without each user of the library needing to know exactly how much space is required.
In many cases, the block of memory I need is just a large array of type double, no problems there. But in some cases I need mixed data types -- say a block of doubles AND a block of integers. My code needs to be portable and should conform to the ANSI standard. I know that it is OK to cast a void pointer to any other pointer type, but I'm concerned about alignment issues if I try to use the same block for two types.
So, specific example. Say I need a block of 3 doubles and 5 ints. Can I implement my functions like this:
void *allocateScratch(...) {
return malloc(3 * sizeof(double) + 5 * sizeof(int));
}
void doCalculation(..., void *scratch) {
double *dblArray = scratch;
int *intArray = ((unsigned char*)scratch) + 3 * sizeof(double);
}
Is this legal? The alignment probably works out OK in this example, but what if I switch it around and take the int block first and the double block second, that will shift the alignment of the double's (assuming 64-bit doubles and 32-bit ints). Is there a better way to do this? Or a more standard approach I should consider?
My biggest goals are as follows:
I'd like to use a single block if possible so the user doesn't have to deal with multiple blocks or a changing number of blocks required.
I'd like the block to be a valid block obtained by malloc so the user can call free when finished. This means I don't want to do something like creating a small struct that has pointers to each block and then allocating each block separately, which would require a special destroy function; I'm willing to do that if that's the "only" way.
The algorithms and memory requirements may change, so I'm trying to use the allocate function so that future versions can get different amounts of memory for potentially different types of data without breaking backward compatibility.
Maybe this issue is addressed in the C standard, but I haven't been able to find it.
The memory of a single malloc can be partitioned for use in multiple arrays as shown below.
Suppose we want arrays of types A, B, and C with NA, NB, and NC elements. We do this:
size_t Offset = 0;
ptrdiff_t OffsetA = Offset; // Put array at current offset.
Offset += NA * sizeof(A); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(B)); // Align sufficiently for type.
ptrdiff_t OffsetB = Offset; // Put array at current offset.
Offset += NB * sizeof(B); // Move offset to end of array.
Offset = RoundUp(Offset, sizeof(C)); // Align sufficiently for type.
ptrdiff_t OffsetC = Offset; // Put array at current offset.
Offset += NC * sizeof(C); // Move offset to end of array.
unsigned char *Memory = malloc(Offset); // Allocate memory.
// Set pointers for arrays.
A *pA = Memory + OffsetA;
B *pB = Memory + OffsetB;
C *pC = Memory + OffsetC;
where RoundUp is:
// Return Offset rounded up to a multiple of Size.
size_t RoundUp(size_t Offset, size_t Size)
{
size_t x = Offset + Size - 1;
return x - x % Size;
}
This uses the fact, as noted by R.., that the size of a type must be a multiple of the alignment requirement for that type. In C 2011, sizeof in the RoundUp calls can be changed to _Alignof, and this may save a small amount of space when the alignment requirement of a type is less than its size.
If the user is calling your library's allocation function, then they should call your library's freeing function. This is very typical (and good) interface design.
So I would say just go with the struct of pointers to different pools for your different types. That's clean, simple, and portable, and anybody who reads your code will see exactly what you are up to.
If you do not mind wasting memory and insist on a single block, you could create a union with all of your types and then allocate an array of those...
Trying to find appropriately aligned memory in a massive block is just a mess. I am not even sure you can do it portably. What's the plan? Cast pointers to intptr_t, do some rounding, then cast back to a pointer?
The latest C11 standard has the max_align_t type (and _Alignas specifier and _Alignof operator and <stdalign.h> header).
GCC compiler has a __BIGGEST_ALIGNMENT__ macro (giving the maximal size alignment). It also proves some extensions related to alignment.
Often, using 2*sizeof(void*) (as the biggest relevant alignment) is in practice quite safe (at least on most of the systems I heard about these days; but one could imagine weird processors and systems where it is not the case, perhaps some DSP-s). To be sure, study the details of the ABI and calling conventions of your particular implementation, e.g. x86-64 ABI and x86 calling conventions...
And the system malloc is guaranteed to return a sufficiently aligned pointer (for all purposes).
On some systems and targets and some processors giving a larger alignment might give performance benefit (notably when asking the compiler to optimize). You may have to (or want to) tell the compiler about that, e.g. on GCC using variable attributes...
Don't forget that according to Fulton
there is no such thing as portable software, only software that has been ported.
but intptr_t and max_align_t is here to help you....
Note that the required alignment for any type must evenly divide the size of the type; this is a consequence of the representation of array types. Thus, in the absence of C11 features to determine the required alignment for a type, you can just estimate conservatively and use the type's size. In other words, if you want to carve up part of an allocation from malloc for use storing doubles, make sure it starts at an offset that's a multiple of sizeof(double).

Full Page Malloc

I am trying to optimize the memory allocation of my program by using entire pages at a time.
I am grabbing the page size like this: sysconf(_SC_PAGESIZE); then calculating the total number of elements that will fit in a page like this: elements=pageSize/sizeof(Node);
I was thinking that when I actually go to malloc my memory I would use malloc(elements*sizeof(Node)); It seems like the multiplication and division of sifeof(Node) would cancel out, but with integer division, I do not believe that that is the case.
Is this the best way to malloc an entire page at a time?
Thanks
The malloc function doesn't have any concept of pagesize. Unless you are allocating pages that are ALSO aligned to a page-boundary, you will not get ANY benefit from calling malloc in this way. Just malloc as many elements as you need, and stop worrying about micro-optimising something that almost certainly won't give you any benefit at all.
Yes, the Linux kernel does things like this all the time. There are two reasons for that:
You don't want to allocate blocks LARGER than a page, since that significantly increases the risk of allocation failure.
The kernel allocation is made on a per-page basis, rather than like the C library, which allocates a large amount of memory in one go, and then splits it into small components.
If you really want to allocate page-size amount of memory, then use the result from sysconf(_SC_PAGESIZE) as your size argument. But it is almost certain that your allocation straddles two pages.
Your computation elements=pageSize/sizeof(Node); doesn't take account of the malloc() metadata that are added to any block/chunk of memory returned by malloc(). In many cases, malloc() will return a memory block likely aligned at least on min(sizeof(double),2 * sizeof(void *)) boundary (32 bytes is becoming quite common btw ...). If malloc() gets a memory block aligned on a page, adds its chunk (with padding), and you write a full page size of data, the last bytes are off the first page: so you're ending up using 2 pages.
Want a whole page, just for you, without concerns about wasting memory, without using mmap() / VirtualAlloc() as suggested in the comments ?
Here you are:
int ret;
void *ptr = NULL;
size_t page_size = sysconf(_SC_PAGESIZE);
ret = posix_memalign(&ptr, page_size, page_size);
if (ret != 0 || ptr == NULL) {
fprintf(stderr, "posix_memalign failed: %s\n", strerror(ret));
}
By the way, this is probably about micro-optimization.
You probably still haven't checked your Node have a size multiple of a cache-line, nor how to improve cache-locality, nor found a way to reduce memory fragmentation. So you're probably going in the wrong way: make it works first, profil, optimize your algorithms, profil, micro-optimize at the last option.
The C11 standard added the aligned_alloc call, so you can do something like:
#include <stdlib.h>
#include <unistd.h>
void *alloc_page( void )
{
long page_size = sysconf( _SC_PAGESIZE ); /* arguably could be a constant, #define, etc. */
return ( page_size > 0 ? aligned_alloc( page_size, page_size ) : NULL );
}
The problem with this approach, as others pointed out, is that typically the implementation of the standard alloc calls add some bookkeeping overhead that is stored just before the allocated memory. So, this allocation will usually straddle two pages: the returned page for you to use, and the very end of another page used by the allocator's bookkeeping.
That means when you free or realloc this memory, it may need to touch two pages rather than just the one. Also, if you allocate all or most of your memory this way, then you can "waste" a lot of virtual memory as roughly half of the pages allocated to your process at the OS level will only be used a tiny bit for the allocator's bookkeeping.
How important these issues are is hard to say generally, but preferably they would be avoided somehow. Unfortunately, I haven't figured out a clean, easy, and portable way to do that yet.
==============================
Addendum: If you could dynamically figure out malloc's memory overhead and assume it is always constant, then would asking for that much less usually give us what we want?
#include <stdlib.h>
#include <unistd.h>
/* decent default guesses (e.g. - Linux x64) */
static size_t Page_Size = 4096;
static size_t Malloc_Overhead = 32;
/* call once at beginning of program (i.e. - single thread, no allocs yet) */
int alloc_page_init( void )
{
int ret = -1;
long page_size = sysconf( _SC_PAGESIZE );
char *p1 = malloc( 1 );
char *p2 = malloc( 1 );
size_t malloc_overhead;
if ( page_size <= 0 || p1 == NULL || p2 == NULL )
goto FAIL;
malloc_overhead = ( size_t ) ( p2 > p1 ? p2 - p1 : p1 - p2 ); /* non-standard pointer math */
if ( malloc_overhead > 64 || malloc_overhead >= page_size )
goto FAIL;
Page_Size = page_size;
Malloc_Overhead = malloc_overhead;
ret = 0;
FAIL:
if ( p1 )
free( p1 );
if ( p2 )
free( p2 );
return ret;
}
void *alloc_page( void )
{
return aligned_alloc( Page_Size - Malloc_Overhead, Page_Size - Malloc_Overhead );
}
Answer: probably not, because, for example, "As an example of the "supported by the implementation" requirement, POSIX function posix_memalign accepts any alignment that is a power of two and a multiple of sizeof(void *), and POSIX-based implementations of aligned_alloc inherit these requirements."
The above code would likely not request an alignment that is a power of 2 and will therefore likely fail on most platforms.
It seems this is an unavoidable problem with the typical implementations of standard allocation functions. So, it is probably best to just align and alloc based on the page size and likely pay the penalty of the allocator's bookkeeping residing on another page, or use an OS specific call like mmap to avoid this issue.
The standards provide no guarantee that malloc even has a concept of page size. However, it's not uncommon for malloc implementations to dole out entire pages when the allocation size requested is on the order of the page size (or larger).
There's certainly no harm in asking for an allocation that happens to be equal to the page size (or a multiple of the page size) and subdividing it yourself, though it is a little extra work. You might indeed get the behavior you desire, at least on some machines/compiler/library combinations. But you might not either. If you absolutely require page-sized allocations and/or page-aligned memory, you'll have to call an OS-specific API to get it.
If your question is about how to alloc whole memory pages: Use mmap(), not malloc().
Reason:
malloc() must always add some metadata to every allocation, so if you do malloc(4096) it will definitely allocate more than a single page. mmap(), on the other hand, is the kernel's API to map pages into your address space. It's what malloc() uses under the hood.
If your question is about correct rounding: The usual trick to round a up to a multiple of N is to say rounded = (a + N-1)/N*N;. By adding N-1 first, you ensure that the division will round up in all cases. In the case that a is already a multiple of N, the added N-1 will be without effect; in all other cases, you get one more than with rounded = a/N*N;.

Can realloc fail (return NULL) when trimming?

If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?
Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.
The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.
The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.
Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.
I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).
A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).
realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.

question on free() in C language [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
How do free and malloc work in C?
How does free know how many bytes of memory to be free'd when called in a program?
This is implementation specific, but when malloc is called, the size of the allocated memory is kept somewhere (usually offset from the pointer itself). When free is called, it will use that stored size.
This is exactly why you should only ever call free on a pointer that was returned by malloc.
It's done automatically. The corresponding "malloc" has saved the size in a secret place (typically stored at a negative offset from the pointer).
This, of course, mean that you can only free memory that corresponds to a block previously allocated by "malloc".
Asking how it knows "how many bytes to free" is a mistake. It's not like each byte individually has a free/not-free status bit attached to it (well, it could, but this would be an awful implementation). In many implementations the number of bytes in an allocation may be completely irrelevant; it's the data structures used to manage it that are relevant.
It's an implementation detail than can and will vary between different platforms. Here's one example though of how it could be implemented.
Every free call must be paired with a malloc / realloc call which knows the size request. The implementation of malloc could choose to store this size at an offset of the returned memory. Say by allocating a larger buffer than requested, stuffing the size in the front and then returning an offset into the allocated memory. The free function could then simply use the offset of the provided pointer to discover the size to free.
For example
void* malloc(size_t size) {
size_t actualSize = size + sizeof(size_t);
void* buffer = _internal_allocate(actualSize);
*((size_t*)buffer) = size;
return ((size_t*)buffer) + 1;
}
void free(void* buffer) {
size_t* other = buffer;
other--;
size_t originalSize = *other;
// Rest of free
...
}
The answer is implementation-specific.
malloc might keep a dictionary mapping addresses to data records
malloc might allocate a slightly larger block than requested and store metadata before or after the block it actually returns.
In some special cases, not intended for general use, free() is completely a no-op and it doesn't actually keep track.

Resources