How to implement deterministic malloc

How to implement deterministic malloc - c

Say I have two instances of an application, with the same inputs and same execution sequence. Therefore, one instance is a redundant one and is used for comparing data in memory with the other instance, as a kind of error detection mechanism.
Now, I want all memory allocations and deallocations to happen in exactly the same manner in the two processes. What is the easiest way to achieve that? Write my own malloc and free? And what about memories allocated with other functions such as mmap?

I'm wondering what you are trying to achieve. If your process is deterministic, then the pattern of allocation / deallocation should be the same.
The only possible difference could be the address returned by malloc. But you should probably not depend on them (the easiest way being not using pointers as key map or other data structure). And even then, there should only be difference if the allocation is not done through sbrk (the glibc use anonymous mmap for large allocations), or if you are using mmap (as by default the address is selected by the kernel).
If you really want to have exactly the same address, one option is to have a large static buffer and to write a custom allocator that does use memory from this buffer. This has the disadvantage of forcing you to know beforehand the maximum amount of memory you'll ever need. In a non-PIE executable (gcc -fno-pie -no-pie), a static buffer will have the same address every time. For a PIE executable you can disable the kernel's address space layout randomization for loading programs. In a shared library, disabling ASLR and running the same program twice should lead to the same choices by the dynamic linker for where to map libraries.
If you don't know before hand the maximum size of the memory you want to use, or if you don't want to recompile each time this size increase, you can also use mmap to map a large anonymous buffer at a fixed address. Simply pass the size of the buffer and the address to use as parameter to your process and use the returned memory to implement your own malloc on top of it.
static void* malloc_buffer = NULL;
static size_t malloc_buffer_len = 0;
void* malloc(size_t size) {
// Use malloc_buffer & malloc_buffer_len to implement your
// own allocator. If you don't read uninitialized memory,
// it can be deterministic.
return memory;
}
int main(int argc, char** argv) {
size_t buf_size = 0;
uintptr_t buf_addr = 0;
for (int i = 0; i < argv; ++i) {
if (strcmp(argv[i], "--malloc-size") == 0) {
buf_size = atoi(argv[++i]);
}
if (strcmp(argv[i], "--malloc-addr") == 0) {
buf_addr = atoi(argv[++i]);
}
}
malloc_buffer = mmap((void*)buf_addr, buf_size, PROT_WRITE|PROT_READ,
MAP_FIXED|MAP_PRIVATE, 0, 0);
// editor's note: omit MAP_FIXED since you're checking the result anyway
if (malloc_buffer == MAP_FAILED || malloc_buffer != (void*)but_addr) {
// Could not get requested memory block, fail.
exit(1);
}
malloc_size = buf_size;
}
By using MAP_FIXED, we are telling the kernel to replace any existing mappings that overlap with this new one at buf_addr.
(Editor's note: MAP_FIXED is probably not what you want. Specifying buf_addr as a hint instead of NULL already requests that address if possible. With MAP_FIXED, mmap will either return an error or the address you gave it. The malloc_buffer != (void*)but_addr check makes sense for the non-FIXED case, which won't replace an existing mapping of your code or a shared library or anything else. Linux 4.17 introduced MAP_FIXED_NOREPLACE which you can use to make mmap return an error instead of memory at the wrong address you don't want to use. But still leave the check in so your code works on older kernels.)
If you use this block to implement your own malloc and don't use other non-deterministic operation in your code, you can have complete control of the pointer values.
This suppose that your pattern usage of malloc / free is deterministic. And that you don't use libraries that are non-deterministic.
However, I think a simpler solution is to keep your algorithms deterministic and not to depend on addresses to be. This is possible. I've worked on a large scale project were multiple computer had to update state deterministically (so that each program had the same state, while only transmitting inputs). If you don't use pointer for other things than referencing objects (most important things is to never use pointer value for anything, not as a hash, not as a key in a map, ...), then your state will stay deterministic.
Unless what you want to do is to be able to snapshot the whole process memory and do a binary diff to spot divergence. I think it's a bad idea, because how will you know that both of them have reached the same point in their computation? It is much more easier to compare the output, or to have the process be able to compute a hash of the state and use that to check that they are in sync because you can control when this is done (and thus it become deterministic too, otherwise your measurement is non-deterministic).

What is not deterministic is not only malloc but mmap (the basic syscall to get more memory space; it is not a function, it is a system call so is elementary or atomic from the application's point of view; so you cannot rewrite it within the application) because of address space layout randomization on Linux.
You could disable it with
echo 0 > /proc/sys/kernel/randomize_va_space
as root, or thru sysctl.
If you don't disable address space layout randomization you are stuck.
And you did ask a similar question previously, where I explained that your malloc-s won't always be deterministic.
I still think that for some practical applications, malloc cannot be deterministic. Imagine for instance a program having an hash-table keyed by the pid-s of the child processes it is launching. Collision in that table won't be the same in all your processes, etc.
So I believe you won't succeed in making malloc deterministic in your sense, whatever you'll try (unless you restrict yourself to a very narrow class of applications to checkpoint, so narrow that your software won't be very useful).

Simply put, as others have stated: if the execution of your program's instructions is deterministic, then memory returned by malloc() will be deterministic. That assumes your system's implementation doesn't have some call to random() or something to that effect. If you are unsure, read the code or documentation for your system's malloc.
This is with the possible exception of ASLR, as others have also stated. If you don't have root privileges, you can disable it per-process via the personality(2) syscall and the ADDR_NO_RANDOMIZE parameter. See here for more information on the personalities.
Edit: I should also say, if you are unaware: what you're doing is called bisimulation and is a well-studied technique. If you didn't know the terminology, it might help to have that keyword for searching.

When writing high-reliability code, the usual practise is to avoid malloc and other dynamic memory allocation. A compromise sometimes used is to do all such allocation only during system initialisation.

You can used shared memory to store your data. It will accessible from both processes and you can fill it in a deterministic way.

Related

Is there a performance cost to using large mmap calls that go beyond expected memory usage?

Edit: On systems that use on-demand paging
For initializing data structures that are both persistent for the duration of the program and require a dynamic amount of memory is there any reason not to mmap an upper bound from the start?
An example is an array that will persistent for the entire program's life but whose final size is unknown. The approach I am most familiar with is something along the lines of:
type * array = malloc(size);
and when the array has reached capacity doubling it with:
array = realloc(array, 2 * size);
size *= 2;
I understand this is probably the best way to do this if the array might freed mid execution so that its VM can be reused, but if it is persistent is there any reason not to just initialize the array as follows:
array = mmap(0,
huge_size,
PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE|MAP_NORESERVE,
-1, 0)
so that the elements never needs to be copied.
Edit: Specifically for an OS that uses on-demand paging.

Don't try to be smarter than the standard library, unless you 100% know what you are doing.
malloc() already does this for you. If you request a large amount of memory, malloc() will mmap() you a dedicated memory area. If what you are concerned about is the performance hit coming from doing size *= 2; realloc(old, size), then just malloc(huge_size) at the beginning, and then keep track of the actual used size in your program. There really is no point in doing an mmap() unless you explicitly need it for some specific reason: it isn't faster nor better in any particular way, and if malloc() thinks it's needed, it will do it for you.

It's fine to allocate upper bounds as long as:
You're building a 64bit program: 32bit ones have restricted virtual space, even on 64bit CPUs
Your upper bounds don't approach 2^47, as a mathematically derived one might
You're fine with crashing as your out-of-memory failure mode
You'll only run on systems where overcommit is enabled
As a side note, an end user application doing this may want to borrow a page from GHC's book and allocate 1TB up front even if 10GB would do. This unrealistically large amount will ensure that users don't confuse virtual memory usage with physical memory usage.

If you know for a fact that wasting a chunk of memory (most likely an entire page which is likely 4096 bytes) will not cause your program or the other programs running on your system to run out of memory, AND you know for a fact that your program will only ever be compiled and run on UNIX machines, then this approach is not incorrect, but it is not good programming practice for the following reasons:
The <stdlib.h> file you #include to use malloc() and free() in your C programs is specified by the C standard, but it is specifically implemented for your architecture by the writers of the operating system. This means that your specific system was kept in-mind when these functions were written, so finding a sneaky way to improve efficiency for memory allocation is unlikely unless you know the inner workings of memory management in your OS better than those who wrote it.
Furthermore, the <sys/mman.h> file you include to mmap() stuff is not part of the C standard, and will only compile on UNIX machines, which reduces the portability of your code.
There's also a really good chance (assuming a UNIX environment) that malloc() and realloc() already use mmap() behind-the-scenes to allocate memory for your process anyway, so it's almost certainly better to just use them. (read that as "realloc doesn't necessarily actively allocate more space for me, because there's a good chance there's already a chunk of memory that my process has control of that can satisfy my new memory request without calling mmap() again")
Hope that helps!

c malloc functionality for custom memory region

Is there any malloc/realloc/free like implementation where i can specify a memory region where to manage the memory allocation?
I mean regular malloc (etc.) functions manages only the heap memory region.
What if I need to allocate some space in a shared memory segment or in a memory mapped file?

Not 100 %, As per your question you want to maintain your own memory region. so you need to go for your own my_malloc, my_realloc and my_free
Implementing your own my_malloc may help you
void* my_malloc(int size)
{
char* ptr = malloc(size+sizeof(int));
memcpy(ptr, &size, sizeof(int));
return ptr+sizeof(int);
}
This is just a small idea, full implementation will take you to the
answer.
Refer this question
use the same method to achieve my_realloc and my_free

I asked myself this question recently too, because I wanted a malloc implementation for my security programs which could safely wipe out a static memory region just before exit (which contains sensitive data like encryption keys, passwords and other such data).
First, I found this. I thought it could be very good for my purpose, but I really could not understand it's code completely. The license status was also unclear, as it is very important for one of my projects too.
I ended up writing my own.
My own implementation supports multiple heaps at same time, operating over them with pool descriptor structure, automatic memory zeroing of freed blocks, undefined behavior and OOM handlers, getting exact usable size of allocated objects and testing that pointer is still allocated, which is very sufficient for me. It's not very fast and it is more educational grade rather than professional one, but I wanted one in a hurry.
Note that it does not (yet) knows about alignment requirements, but at least it returns an address suitable for storing an 32 bit integer.

Iam using Tasking and I can store data in a specific space of memory. For example I can use:
testVar _at(0x200000);
I'm not sure if this is what you are looking for, but for example I'am using it to store data to external RAM. But as far as I know, it's only workin for global variables.

It is not very hard to implement your own my_alloc and my_free and use preferred memory range. It is simple chain of: block size, flag free/in use, and block data plus final-block marker (e.g. block size = 0). In the beginning you have one large free block and know its address. Note that my_alloc returns the address of block data and block size/flag are few bytes before.

Why does malloc initialize the values to 0 in gcc?

Maybe it is different from platform to platform, but
when I compile using gcc and run the code below, I get 0 every time in my ubuntu 11.10.
#include <stdio.h>
#include <stdlib.h>
int main()
{
double *a = malloc(sizeof(double)*100)
printf("%f", *a);
}
Why do malloc behave like this even though there is calloc?
Doesn't it mean that there is an unwanted performance overhead just to initialize the values to 0 even if you don't want it to be sometimes?
EDIT: Oh, my previous example was not initiazling, but happened to use "fresh" block.
What I precisely was looking for was why it initializes it when it allocates a large block:
int main()
{
int *a = malloc(sizeof(int)*200000);
a[10] = 3;
printf("%d", *(a+10));
free(a);
a = malloc(sizeof(double)*200000);
printf("%d", *(a+10));
}
OUTPUT: 3
0 (initialized)
But thanks for pointing out that there is a SECURITY reason when mallocing! (Never thought about it). Sure it has to initialize to zero when allocating fresh block, or the large block.

Short Answer:
It doesn't, it just happens to be zero in your case.(Also your test case doesn't show that the data is zero. It only shows if one element is zero.)
Long Answer:
When you call malloc(), one of two things will happen:
It recycles memory that was previous allocated and freed from the same process.
It requests new page(s) from the operating system.
In the first case, the memory will contain data leftover from previous allocations. So it won't be zero. This is the usual case when performing small allocations.
In the second case, the memory will be from the OS. This happens when the program runs out of memory - or when you are requesting a very large allocation. (as is the case in your example)
Here's the catch: Memory coming from the OS will be zeroed for security reasons.*
When the OS gives you memory, it could have been freed from a different process. So that memory could contain sensitive information such as a password. So to prevent you reading such data, the OS will zero it before it gives it to you.
*I note that the C standard says nothing about this. This is strictly an OS behavior. So this zeroing may or may not be present on systems where security is not a concern.
To give more of a performance background to this:
As #R. mentions in the comments, this zeroing is why you should always use calloc() instead of malloc() + memset(). calloc() can take advantage of this fact to avoid a separate memset().
On the other hand, this zeroing is sometimes a performance bottleneck. In some numerical applications (such as the out-of-place FFT), you need to allocate a huge chunk of scratch memory. Use it to perform whatever algorithm, then free it.
In these cases, the zeroing is unnecessary and amounts to pure overhead.
The most extreme example I've seen is a 20-second zeroing overhead for a 70-second operation with a 48 GB scratch buffer. (Roughly 30% overhead.)
(Granted: the machine did have a lack of memory bandwidth.)
The obvious solution is to simply reuse the memory manually. But that often requires breaking through established interfaces. (especially if it's part of a library routine)

The OS will usually clear fresh memory pages it sends to your process so it can't look at an older process' data. This means that the first time you initialize a variable (or malloc something) it will often be zero but if you ever reuse that memory (by freeing it and malloc-ing again, for instance) then all bets are off.
This inconsistence is precisely why uninitialized variables are such a hard to find bug.
As for the unwanted performance overheads, avoiding unspecified behaviour is probably more important. Whatever small performance boost you could gain in this case won't compensate the hard to find bugs you will have to deal with if someone slightly modifies the codes (breaking previous assumptions) or ports it to another system (where the assumptions might have been invalid in the first place).

Why do you assume that malloc() initializes to zero? It just so happens to be that the first call to malloc() results in a call to sbrk or mmap system calls, which allocate a page of memory from the OS. The OS is obliged to provide zero-initialized memory for security reasons (otherwise, data from other processes gets visible!). So you might think there - the OS wastes time zeroing the page. But no! In Linux, there is a special system-wide singleton page called the 'zero page' and that page will get mapped as Copy-On-Write, which means that only when you actually write on that page, the OS will allocate another page and initialize it. So I hope this answers your question regarding performance. The memory paging model allows usage of memory to be sort-of lazy by supporting the capability of multiple mapping of the same page plus the ability to handle the case when the first write occurs.
If you call free(), the glibc allocator will return the region to its free lists, and when malloc() is called again, you might get that same region, but dirty with the previous data. Eventually, free() might return the memory to the OS by calling system calls again.
Notice that the glibc man page on malloc() strictly says that the memory is not cleared, so by the "contract" on the API, you cannot assume that it does get cleared. Here's the original excerpt:
malloc() allocates size bytes and returns a pointer to the allocated memory.
The memory is not cleared. If size is 0, then malloc() returns either NULL,
or a unique pointer value that can later be successfully passed to free().
If you would like, you can read more about of that documentation if you are worried about performance or other side-effects.

I modified your example to contain 2 identical allocations. Now it is easy to see malloc doesn't zero initialize memory.
#include <stdio.h>
#include <stdlib.h>
int main(void)
{
{
double *a = malloc(sizeof(double)*100);
*a = 100;
printf("%f\n", *a);
free(a);
}
{
double *a = malloc(sizeof(double)*100);
printf("%f\n", *a);
free(a);
}
return 0;
}
Output with gcc 4.3.4
100.000000
100.000000

From gnu.org:
Very large blocks (much larger than a page) are allocated with mmap (anonymous or via /dev/zero) by this implementation.

The standard does not dictate that malloc() should initialize the values to zero. It just happens at your platform that it might be set to zero, or it might have been zero at the specific moment you read that value.

Your code doesn't demonstrate that malloc initialises its memory to 0. That could be done by the operating system, before the program starts. To see shich is the case, write a different value to the memory, free it, and call malloc again. You will probably get the same address, but you will have to check this. If so, you can look to see what it contains. Let us know!

malloc doesn't initialize memory to zero. It returns it to you as it is without touching the memory or changing its value.
So, why do we get those zeros?
Before answering this question we should understand how malloc works:
When you call malloc it checks whether the glibc allocator has a memory of the requested size or not.
If it does, it will return this memory to you. This memory usually comes due to a previous free operation so it has garbage value(maybe zero or not) in most cases.
On the other hand, if it can't find memory, it will ask the OS to allocate memory for it, by calling sbrk or mmap system calls.
The OS returns a zero-initialized page for security reasons as this memory may have been used by another process and carries valuable information such as passwords or personal data.
You can read about it yourself from this Link:
Neighboring chunks can be coalesced on a free no matter what their
size is. This makes the implementation suitable for all kinds of
allocation patterns without generally incurring high memory waste
through fragmentation.
Very large blocks (much larger than a page) are allocated with mmap
(anonymous or via /dev/zero) by this implementation
In some implementations calloc uses this property of the OS and asks the OS to allocate pages for it to make sure the memory is always zero-initialized without initializing it itself.

Do you know that it is definitely being initialised? Is it possible that the area returned by malloc() just frequently has 0 at the beginning?

Never ever count on any compiler to generate code that will initialize memory to anything. malloc simply returns a pointer to n bytes of memory someplace hell it might even be in swap.
If the contents of the memory is critical initialize it yourself.

does free always (portably) frees & reserve memory for the process or returns to the OS

I have read that free() "generally" does not return memory to the OS. Can we portably make use of this feature of free(). For example,is this portable?
/* Assume I know i would need memory equivalent to 10000 integers at max
during the lifetime of the process */
unsigned int* p = malloc(sizeof(unsigned int) * 10000);
if ( p == NULL)
return 1;
free(p);
/* Different points in the program */
unsigned int* q = malloc(sizeof(unsigned int) * 5);
/* No need to check for the return value of malloc */
I am writing a demo where I would know in advance how many call contexts to support.
Is it feasible to allocate "n" number of "call contexts" structures in advance and then free them immediately. Would that guarantee that my future malloc calls would not fail?
Does this give me any advantage with regards to efficiency? I am thinking one less "if" check plus would memory management work better if a large chunk was initially acquired and is now available with free. Would this cause less fragmentation?

You would be better off keeping the initial block of memory allocated then using a pool to make it available to clients in your application. It isn't good form to rely on esoteric behaviors to maintain the stability of your code. If anything changes, you could be dealing with bad pointers and having program crashes.

You are asking for a portable and low level way to control what happens on the OS side of the memory interface.
On any OS (because c is one of the most widely ported languages out there).
Think about that and keep in mind that OSs differ in their construction and goals and have widely varying sets of needs and properties.
There is a reason the usual c APIs only define how things should look and behave from the c side of the interface, and not how things should be on the OS side.

No, you can't reliably do such a thing. It is portable only in the sense that it's valid C and will probably compile anywhere you try it, but it is still incorrect to rely on this supposed (and undocumented) behaviour.
You will also not get any appreciably better performance. A simple check for NULL returns from malloc() is not what's making your program slow. If you think all your calls to malloc() and free() are slowing your program down, write your own allocation system with the behaviour you want.

You cannot portably rely on any such behavior of malloc(). You can only rely on malloc() giving you a pointer to a block memory of the given size which you can use until you call free().

Hmm, no. What malloc(3) does internally is call the brk(2) to extend the data segment if it's too small for given allocation request. It does touch some of the pages for its internal bookkeeping, but generally not all allocated pages might be backed by physical memory at that point.
This is because many OS-es do memory overcommit - promising the application whatever it requested regardless of available physical memory in hopes that not all memory will be used by the app, that other apps release memory/terminate, and falling back on swapping as last resort. Say on Linux malloc(3) pretty much never fails.
When memory actually gets referenced the kernel will have to find available physical pages to back it up, create/update page tables and TLBs, etc. - normal handling for a page fault. So again, no, you will not get any speedup later unless you go and touch every page in that allocated chunk.
Disclamer: the above is might not be accurate for Windows (so, no again - nothing close to portable.)

No, there's no guarantee free() will not release the memory back, and there's no guarantee your second malloc will succeed.
Even platforms who "generally" don't return memory to the OS, does so at times if it can. You could end up with your first malloc succeeding, and your next malloc not succeding since in the mean time some other part of the system used up your memory.

Not at all. It's not portable at all. In addition, there's no guarantee that another process won't have used the memory in question- C runs on many devices like embedded where virtual memory addressing doesn't exist. Nor will it reduce fragmentation- you'd have to know the page size of the machine and allocate an exact number of pages- and then, when you freed them, they'd be just unfragmented again.
If you really want to guarantee malloced memory, malloc a large block and manually place objects into it. That's the only way. As for efficiency, you must be running on an incredibly starved device in order to save a few ifs.

malloc(3) also does mmap(2), consequently, free(3) does munmap(2), consequently, second malloc() can in theory fail.

The C standard can be read as requiring this, but from a practical viewpoint there are implementations known that don't follow that, so whether it's required or not you can't really depend on it.
So, if you want this to be guaranteed, you'll pretty much need to write your own allocator. A typical strategy is to allocate a big block from malloc (or whatever) and sub-allocate pieces for the rest of the program to use out of that large block (or perhaps a number of large blocks).

For better control you should create your own memory allocator. An example of memory allocator is this one. This is the only way you will have predictable results. Everything else that relies on obscure/undocumented features and works can be attributed to luck.

Why can I write and read memory when I haven't allocated space?

I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. But I'm having a little issue...
I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high.
The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. And I also can read memory locations that I haven't written to.
Here's my current code:
#include <stdio.h>
#include <stdlib.h>
#define HASHSIZE 2
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashTable {
HashKey key;
HashValue value;
} HashEntry;
typedef HashEntry *HashTable;
void hashInsert(HashTable table, HashKey key, HashValue value) {
}
void hashInitialize(HashTable *table, int tabSize) {
*table = malloc(sizeof(HashEntry) * tabSize);
if(!*table) {
perror("malloc");
exit(1);
}
(*table)[0].key = "ABC";
(*table)[0].value = 45;
(*table)[1].key = "XYZ";
(*table)[1].value = 82;
(*table)[2].key = "JKL";
(*table)[2].value = 13;
}
int main(void) {
HashTable t1 = NULL;
hashInitialize(&t1, HASHSIZE);
printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);
return 0;
}
You can easily see that I haven't allocated space for (*table)[2].key = "JKL"; nor (*table)[2].value = 13;. I also shouldn't be able read the memory locations in the last 2 printfs in main().
Can someone please explain this to me and if I can/should do anything about it?
EDIT:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. I'll update this when I have the time. Sorry about that.
EDIT 2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. I want to do things slightly different which makes this question a bit irrelevant. So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below. I'll then post my proper questions...

Just don't do it, it's undefined behavior.
It might accidentially work because you write/read some memory the program doesn't actually use. Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes. Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that. Or anything else harmful - either obvious or subtle yet severe - can happen.
Just don't do it - only read/write memory you legally allocated.

Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.
Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well).
Fall through of such an arrangement leads to the problem that you are observing. Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement. Try writing to even higher array offset - your program would eventually fault.
As for what you should/should-not do - people who have responded above have summarized fairly well. I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.
One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault. It would just mean garbage.

In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas.
In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.
So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!
So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems:
Corruption of other malloc'ed data
Corruption of stack variables, or the call stack itself, causing crashes when a function return's
Corruption of the C-library's malloc/free management memory, causing crashes when malloc() or free() are called
All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred.
Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL)
Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally
There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. They do tend to slow your program down, depending on how much checking is applied..

Think of memory as being a great big blackboard divided into little squares. Writing to a memory location is equivalent to erasing a square and writing a new value there. The purpose of malloc generally isn't to bring memory (blackboard squares) into existence; rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. Historically, it was pretty common for microprocessors to expose all of the system's memory to an application. An piece of code Foo could in theory pick an arbitrary address and store its data there, but with a couple of major caveats:
Some other code `Bar` might have previously stored something there with the expectation that it would remain. If `Bar` reads that location expecting to get back what it wrote, it will erroneously interpret the value written by `Foo` as its own. For example, if `Bar` had stored the number of widgets that were received (23), and `Foo` stored the value 57, the earlier code would then believe it had received 57 widgets.
If `Foo` expects the data it writes to remain for any significant length of time, its data might get overwritten by some other code (basically the flip-side of the above).
Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. In many such systems, each process will often start with a small blackboard and, if attempts are made to malloc more squares than are available, processes can be given new chunks of blackboard area as needed. Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.

Usually malloc will allocate more memory than you require to for alignment purpose. Also because the process really have read/write access to the heap memory region. So reading a few bytes outside of the allocated region seldom trigger any errors.
But still you should not do it. Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen e.g. the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its malloc-ed memory.
(Also, either use char[≥4] as the type of key or malloc the key, because if the key is unfortunately stored on the stack it will become invalid later.)

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight