How can I get information about malloc()'s behavior? - c

I'm wrapping malloc() for some reason. I would like to have some (system-specific, at-run-time) information beyond what I can get by merely calling it. For example:
What's the minimum alignment malloc() is using for allocation?
When allocating a specific stretch of memory, how much did it actually allocate (this could theoretically be more than the amount requested)?
Whether (assuming no other concurrent operations) a realloc() will succeed with the same original address or require a move.
Note: I'd like as portable as answer as possible, but a platform-specific answer is still relevant: Linux, Windows, MacOs, Un*x.

glibc implements malloc_usable_size, which returns the actual allocation size available for application use. Some alternative mallocs implement it as well. Note that glibc can perform a non-moving realloc even if the requested new size is larger than the one malloc_usable_size, so it is not useful for this purpose.
For the other things you are asking, there are no clear answers. Theoretically, malloc should provide memory at least aligned to _Alignof (max_align_t), but many implementations do not do this for various reasons:
max_align_t comes from a compiler such as GCC and thus reflects the compiler's view of the world, and not what malloc provides (see glibc malloc is incompatible with GCC 7 for an example). The C standard assumes a uniform implementation, but in practice, the compiler, the C run-time library, and even malloc are separate components, built from different sources, and on different release cycles, so they can fall out of sync, and a compile-time constant such as _Alignof (max_align_t) will rarely accurately reflect what malloc does at run-time.
Providing the ABI-mandated 16 byte alignment on x86-64 for allocations of 8 or 4 bytes is wasteful.
A malloc implementation may have internal constraints which result in larger alignment then what is required by the architecture specification. Applications obviously cannot rely on that, but it is still observable.
Your question about non-moving realloc does not even have a proper answer: For a multi-threaded program, another thread might place an allocation which blocks the enlargement of the current allocation between the determination of the resize limit and the actual realloc call. A non-moving version of realloc could be a useful addition, but the interface would be quite different (probably something along the lines of please resize to this maximum value if possible without moving the block, otherwise return the largest possible size or something like that).

If you want a portable answer to these questions, your best bet is to implement your own allocation scheme. It would be safer to not use the names malloc(), calloc(), realloc(), free(), strdup(), etc. because you might run into problems with dynamic name resolution, even if your reimplementation of the standard functions is conformant.
Any source code you control could be made to call your allocator by defining a set of macros at the head of every module (via a common header file).
Using system specific tricks to retrieve information from the allocator's metadata is risky because your program will bind to the C library dynamically, so it is possible that such structures change from one system to another, even for the same operating system.
Re-implementing malloc() in terms of lower level systems calls such as mmap() or other system specific stuff seems deceptively simple, but it is a lot of work to make it correct, stable, efficient and reliable. You should look at available proven alternative implementations of malloc and try and tailor them to your needs.


Memory allocation that resizes a buffer ONLY if it can grow in place?

After reading the man-page for realloc(), I came to the realization that it works a little differently than I thought it did. I originally thought that realloc() would attempt to resize a buffer, previously allocated with one of the malloc-family functions, and if it could NOT extend the buffer in place, then it would fail. However, the man-page states:
The realloc() function returns a pointer to the newly allocated memory, which is suitably aligned for any built-in type and may be different from ptr, or NULL if the request fails.
The "may be different from ptr" part is what I'm talking about.
Basically, what I want is a function, similar to realloc(), but which fails if it cannot extend the buffer in place. It seems that there is no function in the standard C library that does this; however, I'm assuming there may be some OS-specific functions that accomplish the same thing.
Could someone tell me what functions are out there that do what I described above, and which OS's they are specific to? Preferably, I'd like to know at least the functions specific to Linux and Windows (and Mac OS would be a nice bonus too :) ).
This may be a duplicate of this post, but I don't think it is for the following reasons:
The question in the post I linked to simply asks, is there a function that extends a buffer in place, whereas, I'm asking, which functions extend a buffer in place.
The accepted answer for that post does not contain the information I need.
Some people were wondering what is the use case I need this for, so I'll explain, below:
I'm writing a C preprocessor (yes, I know... don't reinvent the wheel... well, I'm doing it anyways, so there). And one component of the C preprocessor is a cache for storing pp-tokens which come from various source files, where each source file's set of pp-tokens may be fragmented within the cache. The cache itself, is a linked-list of large chunks of memory. Ideally, I'd like to keep this linked-list short, hence why I'd like to first try resizing the buffer (in place); however, if resizing in place is not possible, then I want to just add another node (i.e. chunk of memory) to the linked list.
Within each cache buffer, there are additional linked-list nodes, which provide a means for iterating through all the pp-tokens of each individual source file, which may be fragmented across the various cache buffers that make up the cache.
The reasons I need the kind of memory reallocation I discussed earlier are the following:
If resizing a cache buffer could not be done in place, and a new buffer had to be allocated and the old memory contents copied, then I'd have a lot of dangling pointers. Jonathan Leffler suggested that I instead store offsets within the buffer, rather than pointers, which I had not even thought about, and is a great idea! However, reason #2...
I want the implementation of the cache to be as fast as possible, and, please correct me if I'm wrong, but it seems to me that (for my use case) it would be faster on average to just add a new cache buffer to the linked list if a given cache buffer could not be resized in place, rather than allocating a new buffer and copying all previous contents and freeing the old buffer. As a sidenote, I am planning on doubling the size of the allocated cache buffer each time cache resizing is needed.
Memory management (in the form of malloc and friends) is generally implemented as a library; it is not part of the Operating System. (An implementation of the library will probably need to use some OS facilities to acquire raw memory -- although that's not a given -- but there is no need to involve the OS for allocating and freeing individual allocations.) So you're not going to find an "OS-specific" solution.
There are a number of different memory allocation libraries available. If you decide to use an alternative to the one preinstalled with your particular distribution, you will probably want to arrange for it to be used by the standard library as well. Details for how to do that vary.
Most allocation libraries do include some additional interfaces, but I don't know of any library which offers the function you're looking for. More common is an API for finding out how much memory is actually in an allocation (which is often more than the amount requested by the malloc). For many libraries, realloc will only expand the allocation in place if it was already big enough, but there may be libraries which are willing to merge a following free block in order to make non-copying realloc possible.
There's a list of some commonly-used libraries in the Wikipedia page on dynamic memory allocation, which also has a good overview of implementation techniques.
And, of course, you could always write your own memory manager (or modify an open source library) to implement that feature. However, while that would be an interesting and satisfying project, I'd strongly suggest you think about (and research) the reasons why this seemingly simple idea has not been implemented in common memory management libraries. There are good reasons.

When is it more appropriate to use valloc() as opposed to malloc()?

C (and C++) include a family of dynamic memory allocation functions, most of which are intuitively named and easy to explain to a programmer with a basic understanding of memory. malloc() simply allocates memory, while calloc() allocates some memory and clears it eagerly. There are also realloc() and free(), which are pretty self-explanatory.
The manpage for malloc() also mentions valloc(), which allocates (size) bytes aligned to the page border.
Unfortunately, my background isn't thorough enough in low-level intricacies; what are the implications of allocating and using page border-aligned memory, and when is this appropriate as opposed to regular malloc() or calloc()?
The manpage for valloc contains an important note:
The function valloc() appeared in 3.0BSD. It is documented as being obsolete in 4.3BSD, and as legacy in SUSv2. It does not appear in POSIX.1-2001.
valloc is obsolete and nonstandard - to answer your question, it would never be appropriate to use in new code.
While there are some reasons to want to allocate aligned memory - this question lists a few good ones - it is usually better to let the memory allocator figure out which bit of memory to give you. If you are certain that you need your freshly-allocated memory aligned to something, use aligned_alloc (C11) or posix_memalign (POSIX) instead.
Allocations with page alignment usually are not done for speed - they're because you want to take advantage of some feature of your processor's MMU, which typically works with page granularity.
One example is if you want to use mprotect(2) to change the access rights on that memory. Suppose, for instance, that you want to store some data in a chunk of memory, and then make it read only, so that any buggy part of your program that tries to write there will trigger a segfault. Since mprotect(2) can only change permissions page by page (since this is what the underlying CPU hardware can enforce), the block where you store your data had better be page aligned, and its size had better be a multiple of the page size. Otherwise the area you set read-only might include other, unrelated data that still needs to be written.
Or, perhaps you are going to generate some executable code in memory and then want to execute it later. Memory you allocate by default probably isn't set to allow code execution, so you'll have to use mprotect to give it execute permission. Again, this has to be done with page granularity.
Another example is if you want to allocate memory now, but might want to mmap something on top of it later.
So in general, a need for page-aligned memory would relate to some fairly low-level application, often involving something system-specific. If you needed it, you'd know. (And as mentioned, you should allocate it not with valloc, but using posix_memalign, or perhaps an anonymous mmap.)
First of all valloc is obsolete, and memalignshould be used instead.
Second thing it's not part of the C (C++) standard at all.
It's a special allocation which is aligned to _SC_PAGESIZE boundary.
When is it useful to use it? I guess never, unless you have some specific low level requirement. If you would need it, you would know to need it, since it's rarely useful (maybe just when trying some micro-optimizations or creating shared memory between processes).
The self-evident answer is that it is appropriate to use valloc when malloc is unsuitable (less efficient) for the application (virtual) memory usage pattern and valloc is better suited (more efficient). This will depend on the OS and libraries and architecture and application...
malloc traditionally allocated real memory from freed memory if available and by increasing the brk point if not, in which case it is cleared by the OS for security reasons.
calloc in a dumb implementation does a malloc and then (re)clears the memory, while a smart implementation would avoid reclearing newly allocated memory that is automatically cleared by the operating system.
valloc relates to virtual memory. In a virtual memory system using the file system, you can allocate a large amount of memory or filespace/swapspace, even more than physical memory, and it will be swapped in by pages so alignment is a factor. In Unix creation of file of a specified file and adding/deleting pages is done using inodes to define the file but doesn't deal with actual disk blocks till needed, in which case it creates them cleared. So I would expect a valloc system to increase the size of the data segment swap without actually allocating physical or swap pages, or running a for loop to clear it all - as the file and paging system does that as needed. Thus valloc should be a heck of a lot faster than malloc. But as with calloc, how particular idiotsyncratic *x/C flavours do it is up to them, and the valloc man page is totally unhelpful about these expectations.
Traditionally this was implemented with brk/sbrk. Of course in a virtual memory system, whether a paged or a segmented system, there is no real need for any of this brk/sbrk stuff and it is enough to simply write the last location in a file or address space to extend up to that point.
Re the allocation to page boundaries, that is not usually something the user wants or needs, but rather is usually something the system wants or needs.
A (probably more expensive) way to simulate valloc is to determine the page boundary and then call aligned_alloc or posix_memalign with this alignment spec.
The fact that valloc is deprecated or has been removed or is not required in some OS' doesn't mean that it isn't still useful and required for best efficiency in others. If it has been deprecated or removed, one would hope that there are replacements that are as efficient (but I wouldn't bet on it, and might, indeed have, written my own malloc replacement).
Over the last 40 years the tradeoffs of real and (once invented) virtual memory have changed periodically, and mainstream OS has tended to go for frills rather than efficiency, with programmers who don't have (time or space) efficiency as a major imperative. In the embedded systems, efficiency is more critical, but even there efficiency is often not well supported by the standard OS and/or tools. But when in doubt, you can roll your own malloc replacement for your application that does what you need, rather than depend on what someone else woke up and decided to do/implement, or to undo/deprecate.
So the real answer is you don't necessarily want to use valloc or malloc or calloc or any of the replacements your current subversion of an OS provides.

What happens when different object files use different malloc implementations

I have a couple of questions.
Suppose a program is compiled using 2 object files. Each uses malloc and free in most of their functions. But these object files were generated at different times and happen to be using different malloc implementations. Let's say the implementations share variable names and function names. Will the program work fine or not? Why?
If a program has object file 1 and 2, code from object file 1 call malloc and allocates some memory then frees it. Now code from object file 2 calls malloc. Can it use the memory that was freed? How does it work underneath?
Trying to provide a useful answer, even though it's far from complete.
Part 1.
First, it's hard enough to link the program with two implementations of malloc sharing function names: duplicate definitions usually cause linker errors. I can see how we manage to do it using GNU binutils, and there probably are some equivalent tricks for other toolchains. For the rest of the answer, let's assume we managed to link two implementations successfully. (It's usually a good thing that you get linker errors instead of mixing two implementations, possibly even introducing malloc/free asymmetry which has almost no chance to work).
Let's also assume that memory allocated with one particular implementation is always freed using free from the same implementation. Otherwise, it's virtually guaranteed to fail.
Two implementations may work together, or they may interfere, depending on how they request more memory from the OS when their local heaps run out of space. MS Windows has a system interface for managing heaps, and two different mallocs are likely to be built on top of it; then nothing prevents them from working together. Implementations requesting memory with sbrk-like call will work together if they're both ready that someone else will request sbrk increase independently of malloc. I'd expect that malloc from glibc won't fail here, but I'm not really sure.
Part 2.
If the implementation used by object 1 is able to return memory to OS, memory can be reused by the implementation called by object 2. That is, memory reuse may happen but it's less likely than when a single implementation is used.
The possibility of returning memory to OS depends on malloc/free implementation, and may also depend on allocated chunk size and various system settings. For example, glibc uses anonymous mmap for large chunks of memory, and these chunks are unmapped when freed.

Is realloc() safe in embedded system?

While developing a piece of software for embedded system I used realloc() function many times. Now I've been said that I "should not use realloc() in embedded" without any explanation.
Is realloc() dangerous for embedded system and why?
Yes, all dynamic memory allocation is regarded as dangerous, and it is banned from most "high integrity" embedded systems, such as industrial/automotive/aerospace/med-tech etc etc. The answer to your question depends on what sort of embedded system you are doing.
The reasons it's banned from high integrity embedded systems is not only the potential memory leaks, but also a lot of dangerous undefined/unspecified/impl.defined behavior asociated with those functions.
EDIT: I also forgot to mention heap fragmentation, which is another danger. In addition, MISRA-C also mentions "data inconsistency, memory exhaustion, non-deterministic behaviour" as reasons why it shouldn't be used. The former two seem rather subjective, but non-deterministic behaviour is definitely something that isn't allowed in these kind of systems.
MISRA-C:2004 Rule 20.4 "Dynamic heap memory allocation shall not be used."
IEC 61508 Functional safety, 61508-3 Annex B (normative) Table B1, >SIL1: "No dynamic objects", "No dynamic variables".
It depends on the particular embedded system. Dynamic memory management on an small embedded system is tricky to begin with, but realloc is no more complicated than a free and malloc (of course, that's not what it does). On some embedded systems you'd never dream of calling malloc in the first place. On other embedded systems, you almost pretend it's a desktop.
If your embedded system has a poor allocator or not much RAM, then realloc might cause fragmentation problems. Which is why you avoid malloc too, cause it causes the same problems.
The other reason is that some embedded systems must be high reliability, and malloc / realloc can return NULL. In these situations, all memory is allocated statically.
In many embedded systems, a custom memory manager can provide better semantics than are available with malloc/realloc/free. Some applications, for example, can get by with a simple mark-and-release allocator. Keep a pointer to the start of not-yet-allocated memory, allocate things by moving the pointer upward, and jettison them by moving the pointer below them. That won't work if it's necessary to jettison some things while keeping other things that were allocated after them, but in situations where that isn't necessary the mark-and-release allocator is cheaper than any other allocation method. In some cases where the mark-and-release allocator isn't quite good enough, it may be helpful to allocate some things from the start of the heap and other things from the end of the heap; one may free up the things allocated from one end without affecting those allocated from the other.
Another approach that can sometimes be useful in non-multitasking or cooperative-multitasking systems is to use memory handles rather than direct pointers. In a typical handle-based system, there's a table of all allocated objects, built at the top of memory working downward, and objects themselves are allocated from the bottom up. Each allocated object in memory holds either a reference to the table slot that references it (if live) or else an indication of its size (if dead). The table entry for each object will hold the object's size as well as a pointer to the object in memory. Objects may be allocated by simply finding a free table slot (easy, since table slots are all fixed size), storing the address of the object's table slot at the start of free memory, storing the object itself just beyond that, and updating the start of free memory to point just past the object. Objects may be freed by replacing the back-reference with a length indication, and freeing the object in the table. If an allocation would fail, relocate all live objects starting at the top of memory, overwriting any dead objects, and updating the object table to point to their new addresses.
The performance of this approach is non-deterministic, but fragmentation is not a problem. Further, it may be possible in some cooperative multitasking systems to perform garbage collection "in the background"; provided that the garbage collector can complete a pass in the time it takes to chug through the slack space, long waits can be avoided. Further, some fairly simple "generational" logic may be used to improve average-case performance at the expense of worst-case performance.
realloc can fail, just like malloc can. This is one reason why you probably should not use either in an embedded system.
realloc is worse than malloc in that you will need to have the old and new pointers valid during the realloc. In other words, you will need 2X the memory space of the original malloc, plus any additional amount (assuming realloc is increasing the buffer size).
Using realloc is going to be very dangerous, because it may return a new pointer to your memory location. This means:
All references to the old pointer must be corrected after realloc.
For a multi-threaded system, the realloc must be atomic. If you are disabling interrupts to achieve this, the realloc time might be long enough to cause a hardware reset by the watchdog.
Update: I just wanted to make it clear. I'm not saying that realloc is worse than implementing realloc using a malloc/free. That would be just as bad. If you can do a single malloc and free, without resizing, it's slightly better, yet still dangerous.
The issues with realloc() in embedded systems are no different than in any other system, but the consequences may be more severe in systems where memory is more constrained, and the sonsequences of failure less acceptable.
One problem not mentioned so far is that realloc() (and any other dynamic memory operation for that matter) is non-deterministic; that is it's execution time is variable and unpredictable. Many embedded systems are also real-time systems, and in such systems, non-deterministic behaviour is unacceptable.
Another issue is that of thread-safety. Check your library's documantation to see if your library is thread-safe for dynamic memory allocation. Generally if it is, you will need to implement mutex stubs to integrate it with your particular thread library or RTOS.
Not all emebdded systems are alike; if your embedded system is not real-time (or the process/task/thread in question is not real-time, and is independent of the real-time elements), and you have large amounts of memory unused, or virtual memory capabilities, then the use of realloc() may be acceptable, if perhaps ill-advised in most cases.
Rather than accept "conventional wisdom" and bar dynamic memory regardless, you should understand your system requirements, and the behaviour of dynamic memory functions and make an appropriate decision. That said, if you are building code for reuability and portability to as wide a range of platforms and applications as possible, then reallocation is probably a really bad idea. Don't hide it in a library for example.
Note too that the same problem exists with C++ STL container classes that dynamically reallocate and copy data when the container capacity is increased.
Well, it's better to avoid using realloc if it's possible, since this operation is costly especially being put into the loop: for example, if some allocated memory needs to be extended and there no gap between after current block and the next allocated block - this operation is almost equals: malloc + memcopy + free.

Why isn't there a "memsize" in C which returns the size of a memory block allocated in the heap using malloc?

ok. It can be called anything else as in _msize in Visual Studio.
But why is it not in the standard to return the size of the memory given the memory block alloced using malloc? Since we can not tell how much memory is pointed to by the return pointer following malloc, we could use this "memsize" call to return that information should we need it. "memsize" would be implementation specific as are malloc/free
Just asking as I had to write a wrapper sometime back to store some additional bytes for the size.
Because the C library, including malloc, was designed for minimum overhead. A function like the one you want would require the implementation to record the exact size of the allocation, while implementations may now choose to "round" the size up as they please, to prevent actually reallocating in realloc.
Storing the size requires an extra size_t per allocation, which may be heavy for embedded systems. (And for the PDP-11s and 286s that were still abundant when C89 was written.)
To turn this around, why should there be? There's plenty of stuff in the Standards already, particularly the C++ standard. What are your use cases?
You ask for an adequately-sized chunk of memory, and you get it (or a null pointer or exception). There may or may not be additional bytes allocated, and some of these may be reserved. This is conceptually simple: you ask for what you want, and you get something you can use.
Why complicate it?
I don't think there is any definite answer. The developers of the standard probably considered it, and weighed the pros and cons. Anything that goes into a standard must be implemented by every implementation, so adding things to it places a significant burden on developers. I guess they just didn't find that feature useful enough to warrant this.
In C++, the wrapper that you talk about is provided by the standard. If you allocate a block of memory with std::vector, you can use the member function vector::size() to determine the size of the array and use vector::capacity() to determine the size of the allocation (which might be different).
C, on the other hand, is a low-level language which leaves such concerns to be managed by the developer, since tracking it dynamically (as you suggest) is not strictly necessary and would be redundant in many cases.
