Use a heap overflow to write arbitrary data

Use a heap overflow to write arbitrary data - c

I've been trying to learn the basics of a heap overflow attack. I'm mostly interested in using a corruption or modification of the chunk metadata for the basis of the attack, but I'm also open to other suggestions. I know that my goal of the exploit should be do overwrite the printf() function pointer with that of the challenge() function pointer, but I can't seem to figure out how to achieve that write.
I have the following piece of code which I want to exploit, which is using malloc from glibc 2.11.2 :
void challenge()
{
puts("you win\n");
}
int main(int argc, char **argv)
{
char *inputA, *inputB, *inputC;
inputA = malloc(32);
inputB = malloc(32);
inputC = malloc(32);
strcpy(inputA, argv[1]);
strcpy(inputB, argv[2]);
strcpy(inputC, argv[3]);
free(inputC);
free(inputB);
free(inputA);
printf("execute challenge to win\n");
}
Obviously, achieving an actual overwrite of an allocated chunk's metadata is trivial. However, I have not been able to find a way to exploit this code using any of the standard techniques.
I have read and attempted to implement the techniques from:
The paper: w00w00 on Heap Overflows
Although the paper is very clear, the unlink technique has been obsolete for some time.
Malloc Maleficarum.txt
This paper expands upon the exploit techniques from the w00w00 days, and accounts for the newer versions of glibc. However, I have not found that given the 5 techniques detailed in the paper, that the code above matches any of the prerequisites for those techniques.
Understanding the Heap By Breaking it(pdf)
The pdf gives a pretty good review of how the heap works, but focuses on double free techniques.
I originally tried to exploit this code by manipulating the size value of the chunk for inputC, so that it pointed back to the head of the inputC chunk. When that didn't work, I tried pointing further back to the chunk of inputB. That is when I realized that the new glibc performs a sanity check on the size value.
How can a user craft an exploit to take advantage of a free, assuming he has the ability to edit the allocated chunk's metadata to arbitrary values, and user it to overwrite a value in the GOT or write to any other arbitrary address?
Note: When I write 'arbitrary address' I understand that memory pages may be read only or protected, I mean an address that I can assume I can write to.

Note: I will say before I answer that this is purely an academic answer, not intended to be used for malicious purposes. I am aware of the exercises OP is doing and they are open source and not intended to encourage any users to use these techniques in unapproved circumstances.
I will detail the technique below but for your reference I would take a look at the Vudo malloc tricks (It's referenced in one of your links above) because my overview is going to be a short one: http://www.phrack.com/issues.html?issue=57&id=8
It details how malloc handles creating blocks of memory, pulling memory from lists and other things. In particular the unlink attack is of interest for this attack (note: you're correct that glibc now performs a sanity check on sizes for this particular reason, but you should be on an older libc for this exercise... legacy bro).
From the paper, an allocated block and a free block use the same data structure, but the data is handled differently. See here:
chunk -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| prev_size: size of the previous chunk, in bytes (used |
| by dlmalloc only if this previous chunk is free) |
+---------------------------------------------------------+
| size: size of the chunk (the number of bytes between |
| "chunk" and "nextchunk") and 2 bits status information |
mem -> +---------------------------------------------------------+
| fd: not used by dlmalloc because "chunk" is allocated |
| (user data therefore starts here) |
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| bk: not used by dlmalloc because "chunk" is allocated |
| (there may be user data here) |
+ - - - - - - - - - - - - - - - - - - - - - - - - - - - - +
| |
| |
| user data (may be 0 bytes long) |
| |
| |
next -> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| prev_size: not used by dlmalloc because "chunk" is |
| allocated (may hold user data, to decrease wastage) |
+---------------------------------------------------------+
Allocated blocks don't use the fd or bk pointers, but free ones will. This is going to be important later. You should know enough programming to understand that "blocks" in Doug Lea's malloc are organized into a doubly-linked list; there's one list for free blocks and another for allocated ones (technically there are several lists for free depending on sizes but it's irrelevant here since the code allocates blocks of the same size). So when you're freeing a particular block, you have to fix the pointers to keep the list in tact.
e.g. say you're freeing block y from the list below:
x <-> y <-> z
Notice that in the diagram above the spots for bk and fd contain the necessary pointers to iterate along the list. When malloc wants to take a block p off of the list it calls, among other things, a macro to fix the list:
#define unlink( y, BK, FD ) {
BK = P->bk;
FD = P->fd;
FD->bk = BK;
BK->fd = FD;
}
The macro itself isn't hard to understand, but the important thing to note in older versions of libc is that it doesn't perform sanity checks on the size or the pointers being written to. What it means in your case is that without any sort of address randomization you can predictably and reliably determine the status of the heap and redirect an arbitrary pointer to an address of your choosing by overflowing the heap (via the strncopy here) in a specific way.
There's a few things required to get the attack to work:
the fd pointer for your block is pointing to the address you want to overwrite minus 12 bytes. The offset has to do with malloc cleaning up the alignment when it modifies the list
The bk pointer of your block is pointing to your shellcode
The size needs to be -4. This accomplishes a few things, namely it sets the status bits in the block
So you'll have to play around with the offsets in your specific example, but the general malicious format that you're trying to pass with the strcpy here is of the format:
| junk to fill up the legitimate buffer | -4 | -4 | addr you want to overwrite -12 (0x0C) | addr you want to call instead
Note the negative number sets the prev_size field to -4, which makes the free routing believe that the prev_size chunk actually starts in the current chunk that you control/are corrupting.
And yes, a proper explanation wouldn't be complete without mentioning that this attack doesn't work on current versions of glibc; the size has a sanity check done and the unlink method just won't work. That in combination with mitigations like address randomization make this attack not viable on anything but legacy systems. But the method described here is how I did that challenge ;)

Note that most of the techniques explained in Malloc Malleficarum are now protected. The glibc has improved a lot all that double free scenarios.
If you want to improve your knowledge about the Malloc Malleficarum techniques read the Malloc Des-Malleficarum and the House of Lore: Reloaded written by blackngel. You can find these texts in phrack.
Malloc Des-Malleficarum
I'm also working on it, and I can say to you that, for example, House of Mind is no longer exploitable, at least, as is explained in the texts. Although it might be possible to bypass the new restrictions added to the code.
Add that the easiest way to execute your code is to overwrite the .dtors address therefore your code will always be executed once the program finish.
If you download the glibc code and study the critic zones of malloc., etc you will find code checks that are not documented in the documents previously mentioned. These check were included to stop the double free party.
On the other hand, the presentation of Justin N. Ferguson (Understanding the Heap by breaking it) that you could find in youtube (BlackHat 2007) is perfect in order to understand all the heap mechanics, but I must admit that the techniques shown are far from being reliable, but at least, he opens a new field to heap exploitation.
Understanding the heap by breaking it
Anyways, I'm also working on it, so if you want to contact me, we can share our advances. You can reach me in the overflowedminds.net domain as newlog (build the mail address yourself ^^ ).

Heap overflows are tricky to pull off, and are very heavilly heap-layout dependent, although it looks like you're going after the Windows CRT heap, which has lots of mitigations in place specifically to stop this type of attack.
If you really do want to do this kind of thing, you need to happy jumping into WinDbg and stepping into functions like free to see exactly what is going on inside free, and hence what kind of control you might be able to achieve via the heap overflow of the previous value.
I won't give you any more specific help than that for the simple reason that demonstrating a heap overflow is usually enough for defensive security - defensive security experts can report a heap overflow without needing to actually fully exploit it. The only people who do need to fully exploit a heap-overflow all the way to remote code execution are people exploiting bugs offensively, and if you want to do that, you're on your own.

Related

Why did C never implement "stack extension"?

Why did C never implement "stack extension" to allow (dynamically-sized) stack variables of a callee function to be referenced from the caller?
This could work by extending the caller's stack frame to include the "dynamically-returned" variables from the callee's stack frame. (You could, but shouldn't, implement this with alloca from the caller - it may not survive optimisation.)
e.g. If I wanted to return the dynamically-size string "e", the implementation could be:
--+---+-----+
| a | b |
--+---+-----+
callee(d);
--+---+-----+---------+---+
| a | b | junk | d |
--+---+-----+---------+---+
char e[calculated_size];
--+---+-----+---------+---+---------+
| a | b | junk | d | e |
--+---+-----+---------+---+---------+
dynamic_return e;
--+---+-----+-------------+---------+
| a | b | waste | e |
--+---+-----+-------------+---------+
("Junk" contains the return address and other system-specific metadata which is invisible to the program.)
This would waste a little stack space, when used.
The up-side is a simplification of string processing, and any other functions which have to currently malloc ram, return pointers and hope that the caller remembers to free at the right time.
Obviously, there is no point in added such a feature to C at this stage of its life, I'm just interested in why this wasn't a good idea.

A new object may be returned through many layers of software. So the wasted space may be that from dozens or even hundreds of function calls.
Consider also a routine that performs some iterative task. In each iteration, it gets some newly allocated object from a subroutine, which it inserts into a linked list or other data structure. Such iterative tasks may repeat for hundreds, thousands, or millions of iterations. The stack will overflow with wasted space.

Some objections to your idea. Some have been mentioned already in comments. Some come from the top of my head.
C doesn't have stacks or stack frames. C simply defines scopes and their life times and it is left to implementations as to how to implement the standard. Stacks and stack frames are really just the most popular way to implement some C semantics.
C doesn't have strings. C doesn't really have arrays as such. Well, it does have arrays, but as soon as you mention an array in an expression (e.g. a return expression), the array decays to a pointer to its first element. Returning a "string" or an array on the stack would involve significant impact on well established areas of the language.
C does have structs. However, you can already return a struct. I can't tell you how its done, because it is an implementation detail.
A problem with your specific implementation is that the caller has to know how big the "waste" is. Don't forget that the waste will include the stack frame of the callee but also the waste from any functions the callee calls either directly or indirectly. The returning convention will have to include information on the size of the waste and a pointer to the return value.
Stacks, as a rule, are quite limited compared to heap memory, particularly in applications that use threading. At some point the caller will need to move the returned array down into its own stack frame. If the array was merely a pointer to storage in the heap, this would be much more efficient, but then you've got the existing model.

You have to realize, that the implementation of the stack is strongly dictated by the CPU and the OS kernel. The language does not have much say in this. Limitiations are, for instance:
The ret instruction of the X86 architecture expects the return address at the memory location stored in the stack pointer. Thus, there cannot be anything else on top (semantical top - usually this is the lowest address, as stacks tend to grow down). You could work around this, of course, but that would likely incur additional overheads which C programmers are not going to be willing to pay.
The stack pointer defines what part of the allocated stack memory is actually used. When control flow is changed asynchronously (hardware interrupt), the current CPU's registers are generally immediately stored to memory addresses below the stack pointer by the interrupt handler. This can happen at any time, even throughout most of the kernel code. Any data stored below the place where the stack pointer point to would be clobbered by this. (Well, technically, that's not fully correct, there is generally a "red zone" below the stack pointer to which the interrupt handlers may not write any data. But here we are getting very firmly into architectural design peculiarities.)
Destroying a stack frame is generally a single addition of a constant to the stack pointer. This is the fastest kind of instruction you can get, it will generally not require a single cycle to execute (it will execute in parallel to some memory access). If the stack frame has a dynamic size, the stack frame must be destroyed by loading the stack pointer from memory, and for that a base pointer must have been retained. That's a memory access with a significant latency, and another register that must be saved to be used. Again, this is overhead that's generally unnecessary.
Your proposal would definitely be implementable, but it would require some workarounds. And these workarounds would generally cost performance. Small bits of performance, but definitely measurable amounts. That's not what compiler/kernel developers want, and for good reason.

Dynamic Memory Allocation

How malloc() stores metadata?
void* p;
void* q;
p = malloc(sizeof(char));
q = malloc(sizeof(int));
I know that the return value p[0] points to the start of allocated block of memory,
than if I iterate and print
p[-1], p[-2].... q[-1], q[-2]....
or p[1], p[2], p[3], p[4]....
or q[1], q[2], q[3], q[4]....
I find some value that help malloc() to store data howether I can t understand
precisely what that metadata means..I only know that some of them are for block size, for the adress of the next free block but i can t find on the web nothing more
Please, Can you give me some detailed explanation of those value?

How this metadata works and is used depends entirely on the memory management in your libc. Here are some useful writeups to get you started:
Malloc - Typically, classic malloc.
DLMalloc - Doug Lee's Malloc.
GC Malloc - Malloc with garbage collection.
TC Malloc - Thread caching malloc.
Each of these has different aims, benefits and possible deficiencies. For example, perhaps you are concerned about possible heap overflow issues and protections. This may lead to one choice. Perhaps you are looking for better fragment management. This might lead to the selection of Doug Lee's malloc. You really need to specify which library you are using or research them all to understand how the metadata is used to maintain bins, coalesce adjustment free regions, etc.

David Hoelzer has an excellent answer, but I wanted to follow it up a touch more:
"Meta Data" for malloc is 100% implementation driven. Here is a quick overview of some implementations I have written or used:
The meta data stores "next" block pointers.
It stores canary values to make sure you haven't written passed the end of a block
There is no meta data at all because all the blocks are the same size and they are placed in a stack to allow O(1) access - A Unit Allocator. In this case, you'd hit memory from an adjacent block.
The data stores next block ptr, prev block ptr, alignment id, block size, and an identifier that tells a memory debugger exactly where the allocation occured. Very useful for leak tracking.
It hits the previous block's info because the data is stored at the tail of the block instead of the head...
Or mix and match all the above. In any case, reading mem[-1] is a bad idea, and in some cases (thinking embedded systems), could indeed cause a segment fault on only a read if said read happen to address beyond the current memory page and into a forbidden area of memory.
Update as per OP
The 4th scheme I described is one that has quite a bit of information per block - 16-bytes of information not being uncommon as that size won't throw off common alignment schemes. It would contain a 4-byte pointer to the next allocated block, a 4-byte pointer to the previous, an identifier for alignment - a single byte can handle this directly for common sizes of 0-256 byte alignement, but that byte could also represent 256 possible alignment enum values instead, 3 bytes of pad or canary values, and a 4-byte unique identifier to identify where in the code the call was made, though it could be made smaller. This can be a lookup value to a debug table that contains __file__, __line__, and whatever other info you wish to save with the alloction.
This would be one of the heavier varieties as it has a lot of information that needs to be updated when values are allocated and freed.

It is illegal to access p[-1], etc in your example. The results will be undefined and maybe cause memory corruption, or segmentation fault. In general you don't get any control over malloc() at all or information about what it is doing.
That said, some malloc() "replacement" libraries will give you finer grained control or information - you link these into your binary "on top" of the system malloc().

The C standard does not specify how malloc stores its metadata (or even if it will have accessible metadata); therefore, the metadata format and location are implementation-dependent. Portable code must therefore never attempt to access or parse the metadata.
Some malloc implementations provide, as an extension, functions like malloc_usable_size or malloc_size which can tell you the size of allocated blocks. While the presence of these functions is also implementation dependent, they are at least reliable and correct ways to get the information you need if they are present.

why size is not provided in free statement [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicate:
C programming : How does free know how much to free?
Hello All,
How OS will come to know how much size i have to free when we define free(pointer).
I mean we are not providing any size , only pointer to free statement.
How's internally handle the size ?
Thanks,
Neel

The OS won't have a clue, as free is not a system call. However, your C libraries memory allocation system will have recorded the size in some way when the memory was originally allocated by malloc(), so it knows how much to free.

The size is stored internally in the allocator, and the pointer you pass to free is used to reach that data. A very basic approach is to store the size 4 bytes before the pointer, so substracting 4 from the pointer gives you a pointer to it's size.
Notice that the OS doesn't handle this directly, it's implemented by your C/C++ runtime allocator.

When you call malloc, the C library will automatically carve a space out for you on the heap. Because things created on the heap are created dynamically, what is on the heap at any given point in time is not known as it is for the stack. So the library will keep track of all the memory that you have allocated on the heap.
At some point your heap might look like this:
p---+
V
---------------------------------------
... | used (4) | used (10) | used (8) | ...
---------------------------------------
The library will keep track of how much memory is allocated for each block. In this case, the pointer p points to the start of the middle block.
If we do the following call:
free(p);
then the library will free this space for you on the heap, like so...
p---+
V
----------------------------------------
... | used (4) | unused(10) | used (8) | ...
----------------------------------------
Now, the next time that you are looking for some space, say with a call like:
void* ptr = malloc(10);
The newly unused space may be allocated to your program again, which will allow us to reduce the overall amount of memory our program uses.
ptr---+
V
----------------------------------------
... | used (4) | used(10) | used (8) | ...
----------------------------------------
The way your library might handle internally managing the sizes is different. A simple way to implement this, would be to just add an additional amount of bytes (we'll say 1 for the example) at the beginning of each block allocated to hold the size of each block. So our previous block of heap memory would look like this:
bytes: 1 4 1 10 1 8
--------------------------------
... |4| used |10| used |8| used | ...
--------------------------------
^
+---ptr
Now, if we say that block sizes will be rounded up to be divisible by 2, they we have an extra bit at the end of the size (because we can always assume it to be 0, which we can conveniently use to check whether the corresponding block is used or unused.
When we pass a pointer in free:
free(ptr);
The library would move the pointer given back one byte, and change the used/unused bit to unused. In this specific case, we don't even have to actually know the size of the block in order to free it. It only becomes an issue when we try to reallocate the same amount of data. Then, the malloc call would go down the line, checking to see if the next block was free. If it is free, then if it is the right size that block will be returned back to the user, otherwise a new block will be cut at the end of the heap, and more space allocated from the OS if necessary.

Why can I write and read memory when I haven't allocated space?

I'm trying to build my own Hash Table in C from scratch as an exercise and I'm doing one little step at a time. But I'm having a little issue...
I'm declaring the Hash Table structure as pointer so I can initialize it with the size I want and increase it's size whenever the load factor is high.
The problem is that I'm creating a table with only 2 elements (it's just for testing purposes), I'm allocating memory for just those 2 elements but I'm still able to write to memory locations that I shouldn't. And I also can read memory locations that I haven't written to.
Here's my current code:
#include <stdio.h>
#include <stdlib.h>
#define HASHSIZE 2
typedef char *HashKey;
typedef int HashValue;
typedef struct sHashTable {
HashKey key;
HashValue value;
} HashEntry;
typedef HashEntry *HashTable;
void hashInsert(HashTable table, HashKey key, HashValue value) {
}
void hashInitialize(HashTable *table, int tabSize) {
*table = malloc(sizeof(HashEntry) * tabSize);
if(!*table) {
perror("malloc");
exit(1);
}
(*table)[0].key = "ABC";
(*table)[0].value = 45;
(*table)[1].key = "XYZ";
(*table)[1].value = 82;
(*table)[2].key = "JKL";
(*table)[2].value = 13;
}
int main(void) {
HashTable t1 = NULL;
hashInitialize(&t1, HASHSIZE);
printf("PAIR(%d): %s, %d\n", 0, t1[0].key, t1[0].value);
printf("PAIR(%d): %s, %d\n", 1, t1[1].key, t1[1].value);
printf("PAIR(%d): %s, %d\n", 3, t1[2].key, t1[2].value);
printf("PAIR(%d): %s, %d\n", 3, t1[3].key, t1[3].value);
return 0;
}
You can easily see that I haven't allocated space for (*table)[2].key = "JKL"; nor (*table)[2].value = 13;. I also shouldn't be able read the memory locations in the last 2 printfs in main().
Can someone please explain this to me and if I can/should do anything about it?
EDIT:
Ok, I've realized a few things about my code above, which is a mess... But I have a class right now and can't update my question. I'll update this when I have the time. Sorry about that.
EDIT 2:
I'm sorry, but I shouldn't have posted this question because I don't want my code like I posted above. I want to do things slightly different which makes this question a bit irrelevant. So, I'm just going to assume this was question that I needed an answer for and accept one of the correct answers below. I'll then post my proper questions...

Just don't do it, it's undefined behavior.
It might accidentially work because you write/read some memory the program doesn't actually use. Or it can lead to heap corruption because you overwrite metadata used by the heap manager for its purposes. Or you can overwrite some other unrelated variable and then have hard times debugging the program that goes nuts because of that. Or anything else harmful - either obvious or subtle yet severe - can happen.
Just don't do it - only read/write memory you legally allocated.

Generally speaking (different implementation for different platforms) when a malloc or similar heap based allocation call is made, the underlying library translates it into a system call. When the library does that, it generally allocates space in sets of regions - which would be equal or larger than the amount the program requested.
Such an arrangement is done so as to prevent frequent system calls to kernel for allocation, and satisfying program requests for Heap faster (This is certainly not the only reason!! - other reasons may exist as well).
Fall through of such an arrangement leads to the problem that you are observing. Once again, its not always necessary that your program would be able to write to a non-allocated zone without crashing/seg-faulting everytime - that depends on particular binary's memory arrangement. Try writing to even higher array offset - your program would eventually fault.
As for what you should/should-not do - people who have responded above have summarized fairly well. I have no better answer except that such issues should be prevented and that can only be done by being careful while allocating memory.
One way of understanding is through this crude example: When you request 1 byte in userspace, the kernel has to allocate a whole page atleast (which would be 4Kb on some Linux systems, for example - the most granular allocation at kernel level). To improve efficiency by reducing frequent calls, the kernel assigns this whole page to the calling Library - which the library can allocate as when more requests come in. Thus, writing or reading requests to such a region may not necessarily generate a fault. It would just mean garbage.

In C, you can read to any address that is mapped, you can also write to any address that is mapped to a page with read-write areas.
In practice, the OS gives a process memory in chunks (pages) of normally 8K (but this is OS-dependant). The C library then manages these pages and maintains lists of what is free and what is allocated, giving the user addresses of these blocks when asked to with malloc.
So when you get a pointer back from malloc(), you are pointing to an area within an 8k page that is read-writable. This area may contain garbage, or it contain other malloc'd memory, it may contain the memory used for stack variables, or it may even contain the memory used by the C library to manage the lists of free/allocated memory!
So you can imagine that writing to addresses beyond the range you have malloc'ed can really cause problems:
Corruption of other malloc'ed data
Corruption of stack variables, or the call stack itself, causing crashes when a function return's
Corruption of the C-library's malloc/free management memory, causing crashes when malloc() or free() are called
All of which are a real pain to debug, because the crash usually occurs much later than when the corruption occurred.
Only when you read or write from/to the address which does not correspond to a mapped page will you get a crash... eg reading from address 0x0 (NULL)
Malloc, Free and pointers are very fragile in C (and to a slightly lesser degree in C++), and it is very easy to shoot yourself in the foot accidentally
There are many 3rd party tools for memory checking which wrap each memory allocation/free/access with checking code. They do tend to slow your program down, depending on how much checking is applied..

Think of memory as being a great big blackboard divided into little squares. Writing to a memory location is equivalent to erasing a square and writing a new value there. The purpose of malloc generally isn't to bring memory (blackboard squares) into existence; rather, it's to identify an area of memory (group of squares) that's not being used for anything else, and take some action to ensure that it won't be used for anything else until further notice. Historically, it was pretty common for microprocessors to expose all of the system's memory to an application. An piece of code Foo could in theory pick an arbitrary address and store its data there, but with a couple of major caveats:
Some other code `Bar` might have previously stored something there with the expectation that it would remain. If `Bar` reads that location expecting to get back what it wrote, it will erroneously interpret the value written by `Foo` as its own. For example, if `Bar` had stored the number of widgets that were received (23), and `Foo` stored the value 57, the earlier code would then believe it had received 57 widgets.
If `Foo` expects the data it writes to remain for any significant length of time, its data might get overwritten by some other code (basically the flip-side of the above).
Newer systems include more monitoring to keep track of what processes own what areas of memory, and kill off processes that access memory that they don't own. In many such systems, each process will often start with a small blackboard and, if attempts are made to malloc more squares than are available, processes can be given new chunks of blackboard area as needed. Nonetheless, there will often be some blackboard area available to each process which hasn't yet been reserved for any particular purposes. Code could in theory use such areas to store information without bothering to allocate it first, and such code would work if nothing happened to use the memory for any other purpose, but there would be no guarantee that such memory areas wouldn't be used for some other purpose at some unexpected time.

Usually malloc will allocate more memory than you require to for alignment purpose. Also because the process really have read/write access to the heap memory region. So reading a few bytes outside of the allocated region seldom trigger any errors.
But still you should not do it. Since the memory you're writing to can be regarded as unoccupied or is in fact occupied by others, anything can happen e.g. the 2nd and 3rd key/value pair will become garbage later or an irrelevant vital function will crash due to some invalid data you've stomped onto its malloc-ed memory.
(Also, either use char[≥4] as the type of key or malloc the key, because if the key is unfortunately stored on the stack it will become invalid later.)

How does free know how much to free?

In C programming, you can pass any kind of pointer you like as an argument to free, how does it know the size of the allocated memory to free? Whenever I pass a pointer to some function, I have to also pass the size (ie an array of 10 elements needs to receive 10 as a parameter to know the size of the array), but I do not have to pass the size to the free function. Why not, and can I use this same technique in my own functions to save me from needing to cart around the extra variable of the array's length?

When you call malloc(), you specify the amount of memory to allocate. The amount of memory actually used is slightly more than this, and includes extra information that records (at least) how big the block is. You can't (reliably) access that other information - and nor should you :-).
When you call free(), it simply looks at the extra information to find out how big the block is.

Most implementations of C memory allocation functions will store accounting information for each block, either in-line or separately.
One typical way (in-line) is to actually allocate both a header and the memory you asked for, padded out to some minimum size. So for example, if you asked for 20 bytes, the system may allocate a 48-byte block:
16-byte header containing size, special marker, checksum, pointers to next/previous block and so on.
32 bytes data area (your 20 bytes padded out to a multiple of 16).
The address then given to you is the address of the data area. Then, when you free the block, free will simply take the address you give it and, assuming you haven't stuffed up that address or the memory around it, check the accounting information immediately before it. Graphically, that would be along the lines of:
____ The allocated block ____
/ \
+--------+--------------------+
| Header | Your data area ... |
+--------+--------------------+
^
|
+-- The address you are given
Keep in mind the size of the header and the padding are totally implementation defined (actually, the entire thing is implementation-defined (a) but the in-line accounting option is a common one).
The checksums and special markers that exist in the accounting information are often the cause of errors like "Memory arena corrupted" or "Double free" if you overwrite them or free them twice.
The padding (to make allocation more efficient) is why you can sometimes write a little bit beyond the end of your requested space without causing problems (still, don't do that, it's undefined behaviour and, just because it works sometimes, doesn't mean it's okay to do it).
(a) I've written implementations of malloc in embedded systems where you got 128 bytes no matter what you asked for (that was the size of the largest structure in the system), assuming you asked for 128 bytes or less (requests for more would be met with a NULL return value). A very simple bit-mask (i.e., not in-line) was used to decide whether a 128-byte chunk was allocated or not.
Others I've developed had different pools for 16-byte chunks, 64-bytes chunks, 256-byte chunks and 1K chunks, again using a bit-mask to decide what blocks were used or available.
Both these options managed to reduce the overhead of the accounting information and to increase the speed of malloc and free (no need to coalesce adjacent blocks when freeing), particularly important in the environment we were working in.

From the comp.lang.c FAQ list: How does free know how many bytes to free?
The malloc/free implementation remembers the size of each block as it is allocated, so it is not necessary to remind it of the size when freeing. (Typically, the size is stored adjacent to the allocated block, which is why things usually break badly if the bounds of the allocated block are even slightly overstepped)

This answer is relocated from How does free() know how much memory to deallocate? where I was abrubtly prevented from answering by an apparent duplicate question. This answer then should be relevant to this duplicate:
For the case of malloc, the heap allocator stores a mapping of the original returned pointer, to relevant details needed for freeing the memory later. This typically involves storing the size of the memory region in whatever form relevant to the allocator in use, for example raw size, or a node in a binary tree used to track allocations, or a count of memory "units" in use.
free will not fail if you "rename" the pointer, or duplicate it in any way. It is not however reference counted, and only the first free will be correct. Additional frees are "double free" errors.
Attempting to free any pointer with a value different to those returned by previous mallocs, and as yet unfreed is an error. It is not possible to partially free memory regions returned from malloc.

On a related note GLib library has memory allocation functions which do not save implicit size - and then you just pass the size parameter to free. This can eliminate part of the overhead.

The heap manager stored the amount of memory belonging to the allocated block somewhere when you called malloc.
I never implemented one myself, but I guess the memory right in front of the allocated block might contain the meta information.

The original technique was to allocate a slightly larger block and store the size at the beginning, then give the application the rest of the blog. The extra space holds a size and possibly links to thread the free blocks together for reuse.
There are certain issues with those tricks, however, such as poor cache and memory management behavior. Using memory right in the block tends to page things in unnecessarily and it also creates dirty pages which complicate sharing and copy-on-write.
So a more advanced technique is to keep a separate directory. Exotic approaches have also been developed where areas of memory use the same power-of-two sizes.
In general, the answer is: a separate data structure is allocated to keep state.

malloc() and free() are system/compiler dependent so it's hard to give a specific answer.
More information on this other question.

To answer the second half of your question: yes, you can, and a fairly common pattern in C is the following:
typedef struct {
size_t numElements
int elements[1]; /* but enough space malloced for numElements at runtime */
} IntArray_t;
#define SIZE 10
IntArray_t* myArray = malloc(sizeof(intArray_t) + SIZE * sizeof(int));
myArray->numElements = SIZE;

to answer the second question, yes you could (kind of) use the same technique as malloc()
by simply assigning the first cell inside every array to the size of the array.
that lets you send the array without sending an additional size argument.

When we call malloc it's simply consume more byte from it's requirement. This more byte consumption contain information like check sum,size and other additional information.
When we call free at that time it directly go to that additional information where it's find the address and also find how much block will be free.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight