Is there a performance cost to using large mmap calls that go beyond expected memory usage?

Is there a performance cost to using large mmap calls that go beyond expected memory usage? - c

Edit: On systems that use on-demand paging
For initializing data structures that are both persistent for the duration of the program and require a dynamic amount of memory is there any reason not to mmap an upper bound from the start?
An example is an array that will persistent for the entire program's life but whose final size is unknown. The approach I am most familiar with is something along the lines of:
type * array = malloc(size);
and when the array has reached capacity doubling it with:
array = realloc(array, 2 * size);
size *= 2;
I understand this is probably the best way to do this if the array might freed mid execution so that its VM can be reused, but if it is persistent is there any reason not to just initialize the array as follows:
array = mmap(0,
huge_size,
PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE,
-1, 0)
so that the elements never needs to be copied.
Edit: Specifically for an OS that uses on-demand paging.

Don't try to be smarter than the standard library, unless you 100% know what you are doing.
malloc() already does this for you. If you request a large amount of memory, malloc() will mmap() you a dedicated memory area. If what you are concerned about is the performance hit coming from doing size *= 2; realloc(old, size), then just malloc(huge_size) at the beginning, and then keep track of the actual used size in your program. There really is no point in doing an mmap() unless you explicitly need it for some specific reason: it isn't faster nor better in any particular way, and if malloc() thinks it's needed, it will do it for you.

It's fine to allocate upper bounds as long as:
You're building a 64bit program: 32bit ones have restricted virtual space, even on 64bit CPUs
Your upper bounds don't approach 2^47, as a mathematically derived one might
You're fine with crashing as your out-of-memory failure mode
You'll only run on systems where overcommit is enabled
As a side note, an end user application doing this may want to borrow a page from GHC's book and allocate 1TB up front even if 10GB would do. This unrealistically large amount will ensure that users don't confuse virtual memory usage with physical memory usage.

If you know for a fact that wasting a chunk of memory (most likely an entire page which is likely 4096 bytes) will not cause your program or the other programs running on your system to run out of memory, AND you know for a fact that your program will only ever be compiled and run on UNIX machines, then this approach is not incorrect, but it is not good programming practice for the following reasons:
The <stdlib.h> file you #include to use malloc() and free() in your C programs is specified by the C standard, but it is specifically implemented for your architecture by the writers of the operating system. This means that your specific system was kept in-mind when these functions were written, so finding a sneaky way to improve efficiency for memory allocation is unlikely unless you know the inner workings of memory management in your OS better than those who wrote it.
Furthermore, the <sys/mman.h> file you include to mmap() stuff is not part of the C standard, and will only compile on UNIX machines, which reduces the portability of your code.
There's also a really good chance (assuming a UNIX environment) that malloc() and realloc() already use mmap() behind-the-scenes to allocate memory for your process anyway, so it's almost certainly better to just use them. (read that as "realloc doesn't necessarily actively allocate more space for me, because there's a good chance there's already a chunk of memory that my process has control of that can satisfy my new memory request without calling mmap() again")
Hope that helps!

Related

When is it more appropriate to use valloc() as opposed to malloc()?

C (and C++) include a family of dynamic memory allocation functions, most of which are intuitively named and easy to explain to a programmer with a basic understanding of memory. malloc() simply allocates memory, while calloc() allocates some memory and clears it eagerly. There are also realloc() and free(), which are pretty self-explanatory.
The manpage for malloc() also mentions valloc(), which allocates (size) bytes aligned to the page border.
Unfortunately, my background isn't thorough enough in low-level intricacies; what are the implications of allocating and using page border-aligned memory, and when is this appropriate as opposed to regular malloc() or calloc()?

The manpage for valloc contains an important note:
The function valloc() appeared in 3.0BSD. It is documented as being obsolete in 4.3BSD, and as legacy in SUSv2. It does not appear in POSIX.1-2001.
valloc is obsolete and nonstandard - to answer your question, it would never be appropriate to use in new code.
While there are some reasons to want to allocate aligned memory - this question lists a few good ones - it is usually better to let the memory allocator figure out which bit of memory to give you. If you are certain that you need your freshly-allocated memory aligned to something, use aligned_alloc (C11) or posix_memalign (POSIX) instead.

Allocations with page alignment usually are not done for speed - they're because you want to take advantage of some feature of your processor's MMU, which typically works with page granularity.
One example is if you want to use mprotect(2) to change the access rights on that memory. Suppose, for instance, that you want to store some data in a chunk of memory, and then make it read only, so that any buggy part of your program that tries to write there will trigger a segfault. Since mprotect(2) can only change permissions page by page (since this is what the underlying CPU hardware can enforce), the block where you store your data had better be page aligned, and its size had better be a multiple of the page size. Otherwise the area you set read-only might include other, unrelated data that still needs to be written.
Or, perhaps you are going to generate some executable code in memory and then want to execute it later. Memory you allocate by default probably isn't set to allow code execution, so you'll have to use mprotect to give it execute permission. Again, this has to be done with page granularity.
Another example is if you want to allocate memory now, but might want to mmap something on top of it later.
So in general, a need for page-aligned memory would relate to some fairly low-level application, often involving something system-specific. If you needed it, you'd know. (And as mentioned, you should allocate it not with valloc, but using posix_memalign, or perhaps an anonymous mmap.)

First of all valloc is obsolete, and memalignshould be used instead.
Second thing it's not part of the C (C++) standard at all.
It's a special allocation which is aligned to _SC_PAGESIZE boundary.
When is it useful to use it? I guess never, unless you have some specific low level requirement. If you would need it, you would know to need it, since it's rarely useful (maybe just when trying some micro-optimizations or creating shared memory between processes).

The self-evident answer is that it is appropriate to use valloc when malloc is unsuitable (less efficient) for the application (virtual) memory usage pattern and valloc is better suited (more efficient). This will depend on the OS and libraries and architecture and application...
malloc traditionally allocated real memory from freed memory if available and by increasing the brk point if not, in which case it is cleared by the OS for security reasons.
calloc in a dumb implementation does a malloc and then (re)clears the memory, while a smart implementation would avoid reclearing newly allocated memory that is automatically cleared by the operating system.
valloc relates to virtual memory. In a virtual memory system using the file system, you can allocate a large amount of memory or filespace/swapspace, even more than physical memory, and it will be swapped in by pages so alignment is a factor. In Unix creation of file of a specified file and adding/deleting pages is done using inodes to define the file but doesn't deal with actual disk blocks till needed, in which case it creates them cleared. So I would expect a valloc system to increase the size of the data segment swap without actually allocating physical or swap pages, or running a for loop to clear it all - as the file and paging system does that as needed. Thus valloc should be a heck of a lot faster than malloc. But as with calloc, how particular idiotsyncratic *x/C flavours do it is up to them, and the valloc man page is totally unhelpful about these expectations.
Traditionally this was implemented with brk/sbrk. Of course in a virtual memory system, whether a paged or a segmented system, there is no real need for any of this brk/sbrk stuff and it is enough to simply write the last location in a file or address space to extend up to that point.
Re the allocation to page boundaries, that is not usually something the user wants or needs, but rather is usually something the system wants or needs.
A (probably more expensive) way to simulate valloc is to determine the page boundary and then call aligned_alloc or posix_memalign with this alignment spec.
The fact that valloc is deprecated or has been removed or is not required in some OS' doesn't mean that it isn't still useful and required for best efficiency in others. If it has been deprecated or removed, one would hope that there are replacements that are as efficient (but I wouldn't bet on it, and might, indeed have, written my own malloc replacement).
Over the last 40 years the tradeoffs of real and (once invented) virtual memory have changed periodically, and mainstream OS has tended to go for frills rather than efficiency, with programmers who don't have (time or space) efficiency as a major imperative. In the embedded systems, efficiency is more critical, but even there efficiency is often not well supported by the standard OS and/or tools. But when in doubt, you can roll your own malloc replacement for your application that does what you need, rather than depend on what someone else woke up and decided to do/implement, or to undo/deprecate.
So the real answer is you don't necessarily want to use valloc or malloc or calloc or any of the replacements your current subversion of an OS provides.

Under what circumstances can malloc return NULL?

It has never happened to me, and I've programming for years now.
Can someone give me an example of a non-trivial program in which malloc will actually not work?
I'm not talking about memory exhaustion: I'm looking for the simple case when you are allocating just one memory block in a bound size given by the user, lets say an integer, causes malloc to fail.

You need to do some work in embedded systems, you'll frequently get NULL returned there :-)
It's much harder to run out of memory in modern massive-address-space-and-backing-store systems but still quite possible in applcations where you process large amounts of data, such as GIS or in-memory databases, or in places where your buggy code results in a memory leak.
But it really doesn't matter whether you've never experienced it before - the standard says it can happen so you should cater for it. I haven't been hit by a car in the last few decades either but that doesn't mean I wander across roads without looking first.
And re your edit:
I'm not talking about memory exhaustion, ...
the very definition of memory exhaustion is malloc not giving you the desired space. It's irrelevant whether that's caused by allocating all available memory, or heap fragmentation meaning you cannot get a contiguous block even though the aggregate of all free blocks in the memory arena is higher, or artificially limiting your address space usage such using the standards-compliant function:
void *malloc (size_t sz) { return NULL; }
The C standard doesn't distinguish between modes of failure, only that it succeeds or fails.

Yes.
Just try to malloc more memory than your system can provide (either by exhausting your address space, or virtual memory - whichever is smaller).
malloc(SIZE_MAX)
will probably do it. If not, repeat a few times until you run out.

Any program at all written in c that needs to dynamically allocate more memory than the OS currently allows.
For fun, if you are using ubuntu type in
ulimit -v 5000
Any program you run will most likely crash (due to a malloc failure) as you've limited the amount of available memory to any one process to a pithy amount.

Unless your memory is already completely reserved (or heavily fragmented), the only way to have malloc() return a NULL-pointer is to request space of size zero:
char *foo = malloc(0);
Citing from the C99 standard, §7.20.3, subsection 1:
If the size of the space requested is zero, the behavior is implementationdeﬁned: either a null pointer is returned, or the behavior is as if the size were some
nonzero value, except that the returned pointer shall not be used to access an object.
In other words, malloc(0) may return a NULL-pointer or a valid pointer to zero allocated bytes.

Pick any platform, though embedded is probably easier. malloc (or new) a ton of RAM (or leak RAM over time or even fragment it by using naive algorithms). Boom. malloc does return NULL for me on occasion when "bad" things are happening.
In response to your edit. Yes again. Memory fragmentation over time can make it so that even a single allocation of an int can fail. Also keep in mind that malloc doesn't just allocate 4 bytes for an int, but can grab as much space as it wants. It has its own book-keeping stuff and quite often will grab 32-64 bytes minimum.

On a more-or-less standard system, using a standard one-parameter malloc, there are three possible failure modes (that I can think of):
The size of allocation requested is not allowed. Eg, some systems may not allow an allocation > 16M, even if more storage is available.
A contiguous free area of the size requested, with default boundary, cannot be located in the heap. There may still be plenty of heap, but just not enough in one piece.
The total allocated heap has exceeded some "artificial" limit. Eg, the user may be prohibited from allocation more than 100M, even if there's 200M free and available to the "system" in a single combined heap.
(Of course, you can get combinations of 2 and 3, since some systems allocate non-contiguous blocks of address space to the heap as it grows, placing the "heap size limit" on the total of the blocks.)
Note that some environments support additional malloc parameters such as alignment and pool ID which can add their own twists.

Just check the manual page of malloc.
On success, a pointer to the memory block allocated by the function.
The type of this pointer is always void*, which can be cast to the desired type of data pointer in order to be dereferenceable.
If the function failed to allocate the requested block of memory, a null pointer is returned.

Yes. Malloc will return NULL when the kernel/system lib are certain that no memory can be allocated.
The reason you typically don't see this on modern machines is that Malloc doesn't really allocate memory, but rather it requests some “virtual address space” be reserved for your program so you might write in it. Kernels such as modern Linux actually over commit, that is they let you allocate more memory than your system can actually provide (swap + RAM) as long as it all fits in the address space of the system (typically 48bits on 64bit platforms, IIRC). Thus on these systems you will probably trigger an OOM killer before you will trigger a return of a NULL pointer. A good example is a 512MB RAM in a 32bit machine: it's trivial to write a C program that will be eaten by the OOM killer because of it trying to malloc all available RAM + swap.
(Overcomitting can be disabled at compile time on Linux, so it depends on the build options whether or not a given Linux kernel will overcommit. However, stock desktop distro kernels do it.)

Since you asked for an example, here's a program that will (eventually) see malloc return NULL:
perror();void*malloc();main(){for(;;)if(!malloc(999)){perror(0);return 0;}}
What? You don't like deliberately obfuscated code? ;) (If it runs for a few minutes and doesn't crash on your machine, kill it, change 999 to a bigger number and try again.)
EDIT: If it doesn't work no matter how big the number is, then what's happening is that your system is saying "Here's some memory!" but so long as you don't try to use it, it doesn't get allocated. In which case:
perror();char*p;void*malloc();main(){for(;;){p=malloc(999);if(p)*p=0;else{perror(0);return 0;}}
Should do the trick. If we can use GCC extentions, I think we can get it even smaller by changing char*p;void*malloc(); to void*p,*malloc(); but if you really wanted to golf you'd be on the Code Golf SE.

when the malloc param is negative or 0 or you have no memory left on heap.
I had to correct somebody's code which looked like this.
const int8_t bufferSize = 128;
void *buffer = malloc(bufferSize);
Here buffer is NULL because bufferSize is actually -128

How to find how much memory is actually used up by a malloc call?

If I call:
char *myChar = (char *)malloc(sizeof(char));
I am likely to be using more than 1 byte of memory, because malloc is likely to be using some memory on its own to keep track of free blocks in the heap, and it may effectively cost me some memory by always aligning allocations along certain boundaries.
My question is: Is there a way to find out how much memory is really used up by a particular malloc call, including the effective cost of alignment, and the overhead used by malloc/free?
Just to be clear, I am not asking to find out how much memory a pointer points to after a call to malloc. Rather, I am debugging a program that uses a great deal of memory, and I want to be aware of which parts of the code are allocating how much memory. I'd like to be able to have internal memory accounting that very closely matches the numbers reported by top. Ideally, I'd like to be able to do this programmatically on a per-malloc-call basis, as opposed to getting a summary at a checkpoint.

There isn't a portable solution to this, however there may be operating-system specific solutions for the environments you're interested in.
For example, with glibc on Linux, you can use the mallinfo() function from <malloc.h> which returns a struct mallinfo. The uordblks and hblkhd members of this structure contains the dynamically allocated address space used by the program including book-keeping overhead - if you take the difference of this before and after each malloc() call, you will know the amount of space used by that call. (The overhead is not necessarily constant for every call to malloc()).
Using your example:
char *myChar;
size_t s = sizeof(char);
struct mallinfo before, after;
int mused;
before = mallinfo();
myChar = malloc(s);
after = mallinfo();
mused = (after.uordblks - before.uordblks) + (after.hblkhd - before.hblkhd);
printf("Requested size %zu, used space %d, overhead %zu\n", s, mused, mused - s);
Really though, the overhead is likely to be pretty minor unless you are making a very very high number of very small allocations, which is a bad idea anyway.

It really depends on the implementation. You should really use some memory debugger. On Linux Valgrind's Massif tool can be useful. There are memory debugging libraries like dmalloc, ...
That said, typical overhead:
1 int for storing size + flags of this block.
possibly 1 int for storing size of previous/next block, to assist in coallescing blocks.
2 pointers, but these may only be used in free()'d blocks, being reused for application storage in allocated blocks.
Alignment to an approppriate type, e.g: double.
-1 int (yes, that's a minus) of the next/previous chunk's field containing our size if we are an allocated block, since we cannot be coallesced until we're freed.
So, a minimum size can be 16 to 24 bytes. and minimum overhead can be 4 bytes.
But you could also satisfy every allocation via mapping memory pages (typically 4Kb), which would mean overhead for smaller allocations would be huge. I think OpenBSD does this.

There is nothing defined in the C library to query the total amount of physical memory used by a malloc() call. The amount of memory allocated is controlled by whatever memory manager is hooked up behind the scenes that malloc() calls into. That memory manager can allocate as much extra memory as it deemes necessary for its internal tracking purposes, on top of whatever extra memory the OS itself requires. When you call free(), it accesses the memory manager, which knows how to access that extra memory so it all gets released properly, but there is no way for you to know how much memory that involves. If you need that much fine detail, then you need to write your own memory manager.

If you do use valgrind/Massif, there's an option to show either the malloc value or the top value, which differ a LOT in my experience. Here's an excerpt from the Valgrind manual http://valgrind.org/docs/manual/ms-manual.html :
...However, if you wish to measure all the memory used by your program,
you can use the --pages-as-heap=yes. When this option is enabled,
Massif's normal heap block profiling is replaced by lower-level page
profiling. Every page allocated via mmap and similar system calls is
treated as a distinct block. This means that code, data and BSS
segments are all measured, as they are just memory pages. Even the
stack is measured...

C - Design your own free( ) function

Today, I appeared for an interview and the interviewer asked me this,
Tell me the steps how will you design your own free( ) function for
deallocate the allocated memory.
How can it be more efficient than C's default free() function ? What can you conclude ?
I was confused, couldn't think of the way to design.
What do you think guys ?
EDIT : Since we need to know about how malloc() works, can you tell me the steps to write our own malloc() function

That's actually a pretty vague question, and that's probably why you got confused. Does he mean, given an existing malloc implementation, how would you go about trying to develop a more efficient way to free the underlying memory? Or was he expecting you to start discussing different kinds of malloc implementations and their benefits and problems? Did he expect you to know how virtual memory functions on the x86 architecture?
Also, by more efficient, does he mean more space efficient or more time efficient? Does free() have to be deterministic? Does it have to return as much memory to the OS as possible because it's in a low-memory, multi-tasking environment? What's our criteria here?
It's hard to say where to start with a vague question like that, other than to start asking your own questions to get clarification. After all, in order to design your own free function, you first have to know how malloc is implemented. So chances are, the question was really about whether or not you knew anything about how malloc can be implemented.
If you're not familiar with the internals of memory management, the easiest way to get started with understanding how malloc is implemented is to first write your own.
Check out this IBM DeveloperWorks article called "Inside Memory Management" for starters.
But before you can write your own malloc/free, you first need memory to allocate/free. Unfortunately, in a protected mode OS, you can't directly address the memory on the machine. So how do you get it?
You ask the OS for it. With the virtual memory features of the x86, any piece of RAM or swap memory can be mapped to a memory address by the OS. What your program sees as memory could be physically fragmented throughout the entire system, but thanks to the kernel's virtual memory manager, it all looks the same.
The kernel usually provides system calls that allow you to map in additional memory for your process. On older UNIX OS's this was usually brk/sbrk to grow heap memory onto the edge of your process or shrink it off, but a lot of systems also provide mmap/munmap to simply map a large block of heap memory in. It's only once you have access to a large, contiguous looking block of memory that you need malloc/free to manage it.
Once your process has some heap memory available to it, it's all about splitting it into chunks, with each chunk containing its own meta information about its size and position and whether or not it's allocated, and then managing those chunks. A simple list of structs, each containing some fields for meta information and a large array of bytes, could work, in which case malloc has to run through the list until if finds a large enough unallocated chunk (or chunks it can combine), and then map in more memory if it can't find a big enough chunk. Once you find a chunk, you just return a pointer to the data. free() can then use that pointer to reverse back a few bytes to the member fields that exist in the structure, which it can then modify (i.e. marking chunk.allocated = false;). If there's enough unallocated chunks at the end of your list, you can even remove them from the list and unmap or shrink that memory off your process's heap.
That's a real simple method of implementing malloc though. As you can imagine, there's a lot of possible ways of splitting your memory into chunks and then managing those chunks. There's as many ways as there are data structures and algorithms. They're all designed for different purposes too, like limiting fragmentation due to small, allocated chunks mixed with small, unallocated chunks, or ensuring that malloc and free run fast (or sometimes even more slowly, but predictably slowly). There's dlmalloc, ptmalloc, jemalloc, Hoard's malloc, and many more out there, and many of them are quite small and succinct, so don't be afraid to read them. If I remember correctly, "The C Programming Language" by Kernighan and Ritchie even uses a simple malloc implementation as one of their examples.

You can't blindly design free() without knowing how malloc() works under the hood because your implementation of free() would need to know how to manipulate the bookkeeping data and that's impossible without knowing how malloc() is implemented.
So an unswerable question could be how you would design malloc() and free() instead which is not a trivial question but you could answer it partially for example by proposing some very simple implementation of a memory pool that would not be equivalent to malloc() of course but would indicate your presence of knowledge.

One common approach when you only have access to user space (generally known as memory pool) is to get a large chunk of memory from the OS on application start-up. Your malloc needs to check which areas of the right size of that pool are still free (through some data structure) and hand out pointers to that memory. Your free needs to mark the memory as free again in the data structure and possibly needs to check for fragmentation of the pool.
The benefits are that you can do allocation in nearly constant time, the drawback is that your application consumes more memory than actually is needed.

Tell me the steps how will you design your own free( ) function for deallocate the allocated memory.
#include <stdlib.h>
#undef free
#define free(X) my_free(X)
inline void my_free(void *ptr) { }
How can it be more efficient than C's default free() function ?
It is extremely fast, requiring zero machine cycles. It also makes use-after-free bugs go away. It's a very useful free function for use in programs which are instantiated as short-lived batch processes; it can usefully be deployed in some production situations.
What can you conclude ?
I really want this job, but in another company.

Memory usage patterns could be a factor. A default implementation of free can't assume anything about how often you allocate/deallocate and what sizes you allocate when you do.
For example, if you frequently allocate and deallocate objects that are of similar size, you could gain speed, memory efficiency, and reduced fragmentation by using a memory pool.
EDIT: as sharptooth noted, only makes sense to design free and malloc together. So the first thing would be to figure out how malloc is implemented.

malloc and free only have a meaning if your app is to work on top of an OS. If you would like to write your own memory management functions you would have to know how to request the memory from that specific OS or you could reserve the heap memory right away using existing malloc and then use your own functions to distribute/redistribute the allocated memory through out your app

There is an architecture that malloc and free are supposed to adhere to -- essentially a class architecture permitting different strategies to coexist. Then the version of free that is executed corresponds to the version of malloc used.
However, I'm not sure how often this architecture is observed.

The knowledge of working of malloc() is necessary to implement free(). You can find a implementation of malloc() and free() using the sbrk() system call in K&R The C Programming Language Chapter 8, Section 8.7 "Example--A Storage Allocator" pp.185-189.

does free always (portably) frees & reserve memory for the process or returns to the OS

I have read that free() "generally" does not return memory to the OS. Can we portably make use of this feature of free(). For example,is this portable?
/* Assume I know i would need memory equivalent to 10000 integers at max
during the lifetime of the process */
unsigned int* p = malloc(sizeof(unsigned int) * 10000);
if ( p == NULL)
return 1;
free(p);
/* Different points in the program */
unsigned int* q = malloc(sizeof(unsigned int) * 5);
/* No need to check for the return value of malloc */
I am writing a demo where I would know in advance how many call contexts to support.
Is it feasible to allocate "n" number of "call contexts" structures in advance and then free them immediately. Would that guarantee that my future malloc calls would not fail?
Does this give me any advantage with regards to efficiency? I am thinking one less "if" check plus would memory management work better if a large chunk was initially acquired and is now available with free. Would this cause less fragmentation?

You would be better off keeping the initial block of memory allocated then using a pool to make it available to clients in your application. It isn't good form to rely on esoteric behaviors to maintain the stability of your code. If anything changes, you could be dealing with bad pointers and having program crashes.

You are asking for a portable and low level way to control what happens on the OS side of the memory interface.
On any OS (because c is one of the most widely ported languages out there).
Think about that and keep in mind that OSs differ in their construction and goals and have widely varying sets of needs and properties.
There is a reason the usual c APIs only define how things should look and behave from the c side of the interface, and not how things should be on the OS side.

No, you can't reliably do such a thing. It is portable only in the sense that it's valid C and will probably compile anywhere you try it, but it is still incorrect to rely on this supposed (and undocumented) behaviour.
You will also not get any appreciably better performance. A simple check for NULL returns from malloc() is not what's making your program slow. If you think all your calls to malloc() and free() are slowing your program down, write your own allocation system with the behaviour you want.

You cannot portably rely on any such behavior of malloc(). You can only rely on malloc() giving you a pointer to a block memory of the given size which you can use until you call free().

Hmm, no. What malloc(3) does internally is call the brk(2) to extend the data segment if it's too small for given allocation request. It does touch some of the pages for its internal bookkeeping, but generally not all allocated pages might be backed by physical memory at that point.
This is because many OS-es do memory overcommit - promising the application whatever it requested regardless of available physical memory in hopes that not all memory will be used by the app, that other apps release memory/terminate, and falling back on swapping as last resort. Say on Linux malloc(3) pretty much never fails.
When memory actually gets referenced the kernel will have to find available physical pages to back it up, create/update page tables and TLBs, etc. - normal handling for a page fault. So again, no, you will not get any speedup later unless you go and touch every page in that allocated chunk.
Disclamer: the above is might not be accurate for Windows (so, no again - nothing close to portable.)

No, there's no guarantee free() will not release the memory back, and there's no guarantee your second malloc will succeed.
Even platforms who "generally" don't return memory to the OS, does so at times if it can. You could end up with your first malloc succeeding, and your next malloc not succeding since in the mean time some other part of the system used up your memory.

Not at all. It's not portable at all. In addition, there's no guarantee that another process won't have used the memory in question- C runs on many devices like embedded where virtual memory addressing doesn't exist. Nor will it reduce fragmentation- you'd have to know the page size of the machine and allocate an exact number of pages- and then, when you freed them, they'd be just unfragmented again.
If you really want to guarantee malloced memory, malloc a large block and manually place objects into it. That's the only way. As for efficiency, you must be running on an incredibly starved device in order to save a few ifs.

malloc(3) also does mmap(2), consequently, free(3) does munmap(2), consequently, second malloc() can in theory fail.

The C standard can be read as requiring this, but from a practical viewpoint there are implementations known that don't follow that, so whether it's required or not you can't really depend on it.
So, if you want this to be guaranteed, you'll pretty much need to write your own allocator. A typical strategy is to allocate a big block from malloc (or whatever) and sub-allocate pieces for the rest of the program to use out of that large block (or perhaps a number of large blocks).

For better control you should create your own memory allocator. An example of memory allocator is this one. This is the only way you will have predictable results. Everything else that relies on obscure/undocumented features and works can be attributed to luck.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight