malloc storing its metadata

malloc storing its metadata - c

Surprisingly both the programs gave the difference between the two pointers same even though the data types were different.....
How exactly does malloc store its meta data was what i was trying to find out with this little experiment...
Program 1 :
int main ()
{
char *i,*j;
i=(char*)malloc (sizeof(char));
j=(char*)malloc (sizeof(char));
printf ("%x\n",i);
printf ("%x\n",j);
return 0;
}
Output :
710010
710030
Program 2 :
int main ()
{
int *i,*j;
i=(int*)malloc (sizeof(int));
j=(int*)malloc (sizeof(int));
printf ("%x\n",i);
printf ("%x\n",j);
return 0;
}
Output :
16b8010
16b8030
What i had in mind before this program :
| meta data of i | memory space of i | meta data of j | memory space of j |
but the results don't support the theory....

malloc "rounds up" allocations to a convenient size set at compile time for the library. This causes subsequent allocations and deallocations to fragment memory less than if allocations were created to exactly match requests.
Where malloc stores its metadata is not actually why the values for both are 0x20 "apart". But you can read up on one method of implementing malloc (and friends) here; see especially slides 16 and 28.
Imagine the case of a string manipulation program, where lots of different sized allocations were occurring in "random" order. Tiny "left over" chunks would quickly develop leaving totally useless bytes of memory spread out between the used chunks. malloc prevents this by satisfying all memory requests in multiples of some minimum size (apparently 0x20 in this case). (OK, technically is you request 0x1E bytes, there will be 2 bytes of "wasted" space left over and unused after your request. Since malloc allocates 0x20 bytes instead of 0x1E, BUT there will not ever be a 2-byte fragment left over. Which is really good because the metadate for malloc is definitely bigger than 2-bytes, so there would be no way to even keep track of those bytes.)

Rather than allocating from a compiled-in fixed-size array, malloc will request space from the operating system as needed. Since other activities in the program may also request space without calling this allocator, the space that malloc manages may not be contiguous. Thus its free storage is kept as a list of free blocks. Each block contains a size, a pointer to the next block, and the space itself. The blocks are kept in order of increasing storage address, and the last block (highest address) points to the first.
When a request is made, the free list is scanned until a big-enough block is found. This algorithm is called first fit, by contrast with best fit, which looks for the smallest block that will satisfy the request. If the block is exactly the size requested it is unlinked from the list and returned to the user. If the block is too big, it is split, and the proper amount is returned to the user while the residue remains on the free list. If no big-enough block is found, another large chunk is obtained by the operating system and linked into the free list.

malloc normally uses a pool of memory and "meta data" is held in the pool not "in between" the chunks of memory allocated.

Related

How C knows size of memory it need to free?

I have pointer to buffer that was initialised with calloc:
notEncryptBuf = (unsigned char*) calloc(1024, notEncryptBuf_len);
Later I moved pointer to another position:
notEncryptBuf+=20;
And finally I free buffer:
free(notEncryptBuf);
Will if free whole allocated size? How C knows size of memory it need to free?

The behavior of free is specified only if it is passed an address that was previously returned by malloc or a related routine or is passed a null pointer (in which case it does nothing). If you pass an address modified from an original allocation, as by notEncryptBuf += 20;, the behavior of free is not specified.
C implementations commonly know how much space is in an allocation because they store it in some bytes immediately preceding the allocation. For example, if you ask for 1,024 byes, it may allocate 1,040, record information about the allocation in the first 16 bytes, and return to you the address 16 bytes after the whole allocation. Then, when you pass that address to free, it looks in the 16 bytes before that address to see the amount of space.
Other implementations are theoretically possible. For example, a memory manager could designate one zone of memory for common fixed-size allocations, such as 32 bytes, and then use a bitmap to indicate whether each 32-byte block in that zone is free or allocated. Or it could keep a database of allocations, using a hash table or trees or other data structures. When free is called, it would look up the address in the database.

How C knows size of memory it need to free?
"C" does not know about memory allocation or freeing. It relies on the underlying memory manager to keep track of the allocated memories and free them up.
That said, if you pass a pointer to free() which was not returned by any allocator function, it invokes undefined behaviour. So, you cannot pass the pointer which you have shifted to free(). You need to pass the pointer which was returned by calloc().

A good way to answer these kinds of questions is to ask yourself, “how would I myself write malloc() and free()?”
Suppose you have a “memory pool” — fancy words for just an array of bytes:
unsigned char memory[10000];
Now the user wants eight bytes of that. The user calls ptr = my_malloc(8). You know full well that you can’t just give the user any random spot in your memory array — you can only give away stuff that hasn’t already been given away.
In other words, you somehow need to keep track of what pieces of memory have been given away.
Linked-lists → Variable-sized elements in an array
One way we know of to manage dynamic memory is through linked-lists. A linked list is a block of memory that you organize with a struct:
struct node
{
SOME_TYPE data; // the data to store
struct node * next; // a pointer to the next node
};
However, since we are the memory manager, we don’t have some magic pool to allocate our node. We have to use our own memory[] to create space for the node.
Let’s make a simple modification. Instead of a pointer, we will keep track of how big a piece is. We can do this with a structure:
struct piece_of_memory
{
int size;
unsigned char memory[size];
};
Memory, then, is just an array of those things, where all the sizes add up to our available memory pool size:
piece_of_memory memory[...];
So now our initial pool of memory looks like this:
int size = 9992; // 10000 - sizeof(int), which is minus eight bytes on 64-bit systems
unsigned char memory[9992];
Graphically, that’s something like
[----,------------------------------------------------------------]
↑ ↑
9992 memory
If I give away eight bytes, that gets reordered:
[----,---][----,--------------------------------------------------]
↑ ↑ ↑ ↑
↑ ↑ ↑ memory
↑ ↑ new size = 9992 - 8 - sizeof(int) = 9976
↑ ↑
8 returned from malloc
That is two of those structs in a row
int size = 8
unsigned char memory[8]
int size = 9976
unsigned char memory[9976]
We can verify that the pieces all use exactly 10000 bytes:
{(size) 8 + (8 bytes) 8} + {(size) 8 + (9976 bytes) 9976}
= 16 + 9984
= 10000
So when the user asks us to ptr = my_malloc(8), we find a piece of memory with at least eight available bytes, rearrange things, then return a pointer to the ‘memory’ part (not the ‘size’!).
Freeing allocated memory
Suppose our user is now finished with the eight bytes and calls my_free(ptr).
[----,---][----,--------------------------------------------------]
8 ↑ 9976
↑
free me!
We can find our struct piece_of_memory (it is sizeof(int) bytes before the address returned to us), and we can recombine the free pieces of memory into a whole free block:
[----,------------------------------------------------------------]
9992
Notice how this only works if the user gives us an address we gave it earlier, right? What would happen if I returned a wrong ptr value?
More to think about
Naturally we must also be able to keep track of which blocks are available to return and which ones are in use. This makes our struct piece_of_memory a bit more complicated. We could do something like:
struct piece_of_memory
{
int size;
bool is_used;
unsigned char memory[];
};
We also need a way for the memory manager to search through the memory blocks for a piece that is big enough for the requested size. If we want to be smart about it, we might take some time to find the smallest available block that is big enough for the requested size.
We don’t actually have to keep the (‘size’ and ‘is_used’) with the ‘memory’ pieces, either. We could split up our struct to simply have an array of (‘size’ + ‘is_used’) structures at one end of our memory[] array and all the pieces of returned memory at the other end.
Finally, we must waste a little memory when we divide it up in order to make sure that we always return a pointer that is aligned for the worst-case alignment needs our user might put it to. For example, if user wants to get dynamic memory for an array of double, we don’t want to return something that is byte-aligned.
This isn’t the only way to do it!
This is just one simple way. More advanced structures could certainly be used as well.
Conclusions
Hopefully you can answer your own questions now:
How does the memory manager know how much memory to free?
(Because it keeps track of it.)
Can I return a pointer that was not given to me by the memory manager?
(No, because it would break things.)
Obviously the memory manager can be written to prevent things from breaking if you try to free a pointer it did not give you, but the C specification does not require it to. It requires (expects) the user to not give it bad input.

Will malloc round up to the nearest page size?

I'm not sure if I'm asking a noob question here, but here I go. I also searched a lot for a similar question, but I got nothing.
So, I know how mmap and brk work and that, regardless of the length you enter, it will round it up to the nearest page boundary. I also know malloc uses brk/sbrk or mmap (At least on Linux/Unix systems) but this raises the question: does malloc also round up to the nearest page size? For me, a page size is 4096 bytes, so if I want to allocate 16 bytes with malloc, 4096 bytes is... a lot more than I asked for.

The basic job of malloc and friends is to manage the fact that the OS can generally only (efficiently) deal with large allocations (whole pages and extents of pages), while programs often need smaller chunks and finer-grained management.
So what malloc (generally) does, is that the first time it is called, it allocates a larger amount of memory from the system (via mmap or sbrk -- maybe one page or maybe many pages), and uses a small amount of that for some data structures to track the heap use (where the heap is, what parts are in use and what parts are free) and then marks the rest of that space as free. It then allocates the memory you requested from that free space and keeps the rest available for subsequent malloc calls.
So the first time you call malloc for eg 16 bytes, it will uses mmap or sbrk to allocate a large chunk (maybe 4K or maybe 64K or maybe 16MB or even more) and initialize that as mostly free and return you a pointer to 16 bytes somewhere. A second call to malloc for another 16 bytes will just return you another 16 bytes from that pool -- no need to go back to the OS for more.
As your program goes ahead mallocing more memory it will just come from this pool, and free calls will return memory to the free pool. If it generally allocates more than it frees, eventually that free pool will run out, and at that point, malloc will call the system (mmap or sbrk) to get more memory to add to the free pool.
This is why if you monitor a process that is allocating and freeing memory with malloc/free with some sort of process monitor, you will generally just see the memory use go up (as the free pool runs out and more memory is requested from the system), and generally will not see it go down -- even though memory is being freed, it generally just goes back to the free pool and is not unmapped or returned to the system. There are some exceptions -- particularly if very large blocks are involved -- but generally you can't rely on any memory being returned to the system until the process exits.

#include <stdio.h>
#include <stdlib.h>
#include <inttypes.h>
#include <unistd.h>
int main(void) {
void *a = malloc(1);
void *b = malloc(1);
uintptr_t ua = (uintptr_t)a;
uintptr_t ub = (uintptr_t)b;
size_t page_size = getpagesize();
printf("page size: %zu\n", page_size);
printf("difference: %zd\n", (ssize_t)(ub - ua));
printf("offsets from start of page: %zu, %zu\n",
(size_t)ua % page_size, (size_t)ub % page_size);
}
prints
page_size: 4096
difference: 32
offsets from start of page: 672, 704
So clearly it is not rounded to page size in this case, which proves that it is not always rounded to page size.
It will hit mmap if you change allocation to some arbitrary large size. For example:
void *a = malloc(10000001);
void *b = malloc(10000003);
and I get:
page size: 4096
difference: -10002432
offsets from start of page: 16, 16
And clearly the starting address is still not page aligned; the bookkeeping must be stored below the pointer and the pointer needs to be sufficiently aligned for the largest alignment generally needed - you can reason this with free - if free is just given a pointer but it needs to figure out the size of the allocation, where could it look for it, and only two choices are feasible: in a separate data structure that lists all base pointers and their allocation sizes, or at some offset below the current pointer. And only one of them is sane.

What is the difference between the two allocation methods?

I want to test how much the OS does allocate when I request 24M memory.
for (i = 0; i < 1024*1024; i++)
ptr = (char *)malloc(24);
When I write like this I get RES is 32M from the top command.
ptr = (char *)malloc(24*1024*1024);
But when I do a little change the RES is 244. What is the difference between them? Why is the result 244?

The allocator has its own data structures about the bookkeeping that require memory as well. When you allocate in small chunks (the first case), the allocator has to keep a lot of additional data about where each chunk is allocated and how long it is. Moreover, you may get gaps of unused memory in between the chunks because malloc has a requirement to return a sufficiently aligned block, most usually on an 8-byte boundary.
In the second case, the allocator gives you just one contiguous block and does bookkeeping only for that block.
Always be careful with a large number of small allocations, as the bookkeeping memory overhead may even outweigh the amount of the data itself.

The second allocation barely touches the memory. The allocator tells you "okay, you can have it" but if you don't actually touch the memory, the OS never actually gives it to you, hoping you'll never use it. Bit like a Ponzi scheme. On the other hand, the other method writes something (a few bytes at most) to many pages, so the OS is forced to actually give you the memory.
Try this to verify, you should get about 24m usage:
memset(ptr, 1, 1024 * 1024 * 24);
In short, top doesn't tell you how much you allocated, i.e. what you asked from malloc. It tells you what the OS allocated to your process.

In addition to what has been said:
It could be that some compilers notice how you allocate multiple 24 Byte Blocks in a loop, assigning their addresses to the same pointer and keeping only the last block you allocated, effectively rendering every other malloc from before useless. So it may optimize your whole loop into something like this:
ptr = (char *)malloc(24);
i = 1024*1024;

What happens to memory after '\0' in a C string?

Surprisingly simple/stupid/basic question, but I have no idea: Suppose I want to return the user of my function a C-string, whose length I do not know at the beginning of the function. I can place only an upper bound on the length at the outset, and, depending on processing, the size may shrink.
The question is, is there anything wrong with allocating enough heap space (the upper bound) and then terminating the string well short of that during processing? i.e. If I stick a '\0' into the middle of the allocated memory, does (a.) free() still work properly, and (b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called? Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
To give this some context, let's say I want to remove consecutive duplicates, like this:
input "Hello oOOOo !!" --> output "Helo oOo !"
... and some code below showing how I'm pre-computing the size resulting from my operation, effectively performing processing twice to get the heap size right.
char* RemoveChains(const char* str)
{
if (str == NULL) {
return NULL;
}
if (strlen(str) == 0) {
char* outstr = (char*)malloc(1);
*outstr = '\0';
return outstr;
}
const char* original = str; // for reuse
char prev = *str++; // [prev][str][str+1]...
unsigned int outlen = 1; // first char auto-counted
// Determine length necessary by mimicking processing
while (*str) {
if (*str != prev) { // new char encountered
++outlen;
prev = *str; // restart chain
}
++str; // step pointer along input
}
// Declare new string to be perfect size
char* outstr = (char*)malloc(outlen + 1);
outstr[outlen] = '\0';
outstr[0] = original[0];
outlen = 1;
// Construct output
prev = *original++;
while (*original) {
if (*original != prev) {
outstr[outlen++] = *original;
prev = *original;
}
++original;
}
return outstr;
}

If I stick a '\0' into the middle of the allocated memory, does
(a.) free() still work properly, and
Yes.
(b.) does the space after the '\0' become inconsequential? Once '\0' is added, does the memory just get returned, or is it sitting there hogging space until free() is called?
Depends. Often, when you allocate large amounts of heap space, the system first allocates virtual address space - as you write to the pages some actual physical memory is assigned to back it (and that may later get swapped out to disk when your OS has virtual memory support). Famously, this distinction between wasteful allocation of virtual address space and actual physical/swap memory allows sparse arrays to be reasonably memory efficient on such OSs.
Now, the granularity of this virtual addressing and paging is in memory page sizes - that might be 4k, 8k, 16k...? Most OSs have a function you can call to find out the page size. So, if you're doing a lot of small allocations then rounding up to page sizes is wasteful, and if you have a limited address space relative to the amount of memory you really need to use then depending on virtual addressing in the way described above won't scale (for example, 4GB RAM with 32-bit addressing). On the other hand, if you have a 64-bit process running with say 32GB of RAM, and are doing relatively few such string allocations, you have an enormous amount of virtual address space to play with and the rounding up to page size won't amount to much.
But - note the difference between writing throughout the buffer then terminating it at some earlier point (in which case the once-written-to memory will have backing memory and could end up in swap) versus having a big buffer in which you only ever write to the first bit then terminate (in which case backing memory is only allocated for the used space rounded up to page size).
It's also worth pointing out that on many operating systems heap memory may not be returned to the Operating System until the process terminates: instead, the malloc/free library notifies the OS when it needs to grow the heap (e.g. using sbrk() on UNIX or VirtualAlloc() on Windows). In that sense, free() memory is free for your process to re-use, but not free for other processes to use. Some Operating Systems do optimise this - for example, using a distinct and independently releasble memory region for very large allocations.
Is it generally bad programming style to leave this hanging space there, in order to save some upfront programming time computing the necessary space before calling malloc?
Again, it depends on how many such allocations you're dealing with. If there are a great many relative to your virtual address space / RAM - you want to explicitly let the memory library know not all the originally requested memory is actually needed using realloc(), or you could even use strdup() to allocate a new block more tightly based on actual needs (then free() the original) - depending on your malloc/free library implementation that might work out better or worse, but very few applications would be significantly affected by any difference.
Sometimes your code may be in a library where you can't guess how many string instances the calling application will be managing - in such cases it's better to provide slower behaviour that never gets too bad... so lean towards shrinking the memory blocks to fit the string data (a set number of additional operations so doesn't affect big-O efficiency) rather than having an unknown proportion of the original string buffer wasted (in a pathological case - zero or one character used after arbitrarily large allocations). As a performance optimisation you might only bother returning memory if unusued space is >= the used space - tune to taste, or make it caller-configurable.
You comment on another answer:
So it comes down to judging whether the realloc will take longer, or the preprocessing size determination?
If performance is your top priority, then yes - you'd want to profile. If you're not CPU bound, then as a general rule take the "preprocessing" hit and do a right-sized allocation - there's just less fragmentation and mess. Countering that, if you have to write a special preprocessing mode for some function - that's an extra "surface" for errors and code to maintain. (This trade-off decision is commonly needed when implementing your own asprintf() from snprintf(), but there at least you can trust snprintf() to act as documented and don't personally have to maintain it).

Once '\0' is added, does the memory just get returned, or is it
sitting there hogging space until free() is called?
There's nothing magical about \0. You have to call realloc if you want to "shrink" the allocated memory. Otherwise the memory will just sit there until you call free.
If I stick a '\0' into the middle of the allocated memory, does (a.)
free() still work properly
Whatever you do in that memory free will always work properly if you pass it the exact same pointer returned by malloc. Of course if you write outside it all bets are off.

\0 is just one more character from malloc and free perspective, they don't care what data you put in the memory. So free will still work whether you add \0 in the middle or don't add \0 at all. The extra space allocated will still be there, it won't be returned back to the process as soon as you add \0 to the memory. I personally would prefer to allocate only the required amount of memory instead of allocating at some upper bound as that will just wasting the resource.

As soon as you get memory from heap by calling malloc(), the memory is yours to use. Inserting \0 is like inserting any other character. This memory will remain in your possession until you free it or until OS claims it back.

The \0is a pure convention to interpret character arrays as stings - it is independent of the memory management. I.e., if you want to get your money back, you should call realloc. The string does not care about memory (what is a source of many security problems).

malloc just allocates a chunk of memory .. Its upto you to use however you want and call free from the initial pointer position... Inserting '\0' in the middle has no consequence...
To be specific malloc doesnt know what type of memory you want (It returns onle a void pointer) ..
Let us assume you wish to allocate 10 bytes of memory starting 0x10 to 0x19 ..
char * ptr = (char *)malloc(sizeof(char) * 10);
Inserting a null at 5th position (0x14) does not free the memory 0x15 onwards...
However a free from 0x10 frees the entire chunk of 10 bytes..

free() will still work with a NUL byte in memory
the space will remain wasted until free() is called, or unless you subsequently shrink the allocation

Generally, memory is memory is memory. It doesn't care what you write into it. BUT it has a race, or if you prefer a flavor (malloc, new, VirtualAlloc, HeapAlloc, etc). This means that the party that allocates a piece of memory must also provide the means to deallocate it. If your API comes in a DLL, then it should provide a free function of some sort.
This of course puts a burden on the caller right?
So why not put the WHOLE burden on the caller?
The BEST way to deal with dynamically allocated memory is to NOT allocate it yourself. Have the caller allocate it and pass it on to you. He knows what flavor he allocated, and he is responsible to free it whenever he is done using it.
How does the caller know how much to allocate?
Like many Windows APIs have your function return the required amount of bytes when called e.g. with a NULL pointer, then do the job when provided with a non-NULL pointer (using IsBadWritePtr if it is suitable for your case to double-check accessibility).
This can also be much much more efficient. Memory allocations COST a lot. Too many memory allocations cause heap fragmentation and then the allocations cost even more. That's why in kernel mode we use the so called "look-aside lists". To minimize the number of memory allocations done, we reuse the blocks we have already allocated and "freed", using services that the NT Kernel provides to driver writers.
If you pass on the responsibility for memory allocation to your caller, then he might be passing you cheap memory from the stack (_alloca), or passing you the same memory over and over again without any additional allocations. You don't care of course, but you DO allow your caller to be in charge of optimal memory handling.

To elaborate on the use of the NULL terminator in C:
You cannot allocate a "C string" you can allocate a char array and store a string in it, but malloc and free just see it as an array of the requested length.
A C string is not a data type but a convention for using a char array where the null character '\0' is treated as the string terminator.
This is a way to pass strings around without having to pass a length value as a separate argument. Some other programming languages have explicit string types that store a length along with the character data to allow passing strings in a single parameter.
Functions that document their arguments as "C strings" are passed char arrays but have no way of knowing how big the array is without the null terminator so if it is not there things will go horribly wrong.
You will notice functions that expect char arrays that are not necessarily treated as strings will always require a buffer length parameter to be passed.
For example if you want to process char data where a zero byte is a valid value you can't use '\0' as a terminator character.

You could do what some of the MS Windows APIs do where you (the caller) pass a pointer and the size of the memory you allocated. If the size isn't enough, you're told how many bytes to allocate. If it was enough, the memory is used and the result is the number of bytes used.
Thus the decision about how to efficiently use memory is left to the caller. They can allocate a fixed 255 bytes (common when working with paths in Windows) and use the result from the function call to know whether more bytes are needed (not the case with paths due to MAX_PATH being 255 without bypassing Win32 API) or whether most of the bytes can be ignored...
The caller could also pass zero as the memory size and be told exactly how much needs to be allocated - not as efficient processing-wise, but could be more efficient space-wise.

You can certainly preallocate to an upperbound, and use all or something less.
Just make sure you actually use all or something less.
Making two passes is also fine.
You asked the right questions about the tradeoffs.
How do you decide?
Use two passes, initially, because:
1. you'll know you aren't wasting memory.
2. you're going to profile to find out where
you need to optimize for speed anyway.
3. upperbounds are hard to get right before
you've written and tested and modified and
used and updated the code in response to new
requirements for a while.
4. simplest thing that could possibly work.
You might tighten up the code a little, too.
Shorter is usually better. And the more the
code takes advantage of known truths, the more
comfortable I am that it does what it says.
char* copyWithoutDuplicateChains(const char* str)
{
if (str == NULL) return NULL;
const char* s = str;
char prev = *s; // [prev][s+1]...
unsigned int outlen = 1; // first character counted
// Determine length necessary by mimicking processing
while (*s)
{ while (*++s == prev); // skip duplicates
++outlen; // new character encountered
prev = *s; // restart chain
}
// Construct output
char* outstr = (char*)malloc(outlen);
s = str;
*outstr++ = *s; // first character copied
while (*s)
{ while (*++s == prev); // skip duplicates
*outstr++ = *s; // copy new character
}
// done
return outstr;
}

Manual allocation in a stringbuffer object

For a small to-be-embedded application, I wrote a few functions + struct that work as String Buffer (similar to std::stringstream in C++).
While the code as such works fine, There are a few not-so-minor problems:
I never before wrote functions in C that manually allocate and use growing memory, thus I'm afraid there are still some quirks that yet need to be adressed
It seems the code allocates far more memory than it actually needs, which is VERY BAD
Due to warnings reported by valgrind I have switched from malloc to calloc in one place in the code, which sucessfully removed the warning, but I'm not entirely sure if i'm actually using it correctly
Example of what I mean that it allocates more than it really needs (using a 56k file):
==23668== HEAP SUMMARY:
==23668== in use at exit: 0 bytes in 0 blocks
==23668== total heap usage: 49,998 allocs, 49,998 frees, 1,249,875,362 bytes allocated
... It just doesn't look right ...
The code in question is here (too large to copy it in a <code> field on SO): http://codepad.org/LQzphUzd
Help is needed, and I'm grateful for any advice!

The way you are growing your buffer is rather inefficient. For each little piece of string, you realloc() memory, which can mean new memory is allocated and the contents of the "old" memory are copied over. That is slow and fragments your heap.
Better is to grow in fixed amounts, or in fixed percentages, i.e. make the new size 1.5 or 2 times the size of the old size. That also wastes some memory, but will keep the heap more usable and not so many copies are made.
This means you'll have to keep track of two values: capacity (number of bytes allocated) and length (actual length of the string). But that should not be too hard.
I would introduce a function "FstrBuf_Grow" which takes care of all of that. You just call it with the amount of memory you want to add, and FstrBuf_Grow will take care that the capacity matches the requirements by reallocing when necessary and at least as much as necessary.
...
void FstrBuf_Grow(FstringBuf *buf, size_t more)
{
while (buf->length + more) > buf->capacity
buf->capacity = 3 * buf->capacity / 2;
buf->data = realloc(buf->data, buf->capacity + 1);
}
That multiplies capacity by 1.5 until data is large enough. You can choose different strategies, depending on your needs.

The strncat(ptr->data, str, len);, move before the ptr->length = ((ptr->length) + len); and use strncpy(ptr->data+ptr->length.... And the ptr = NULL; in the Destroy is useless.
The code of the "library" seems to be correct BUT be aware that you are continously reallocating the buffer. Normally you should try to grow the buffer only rarely (for example every time you need to grow the buffer you use max(2* the current size, 4) as the new size) because growing the buffer is O(n). The big memory allocation is probably because the first time you allocate a small buffer. Then you realloc it in a bigger buffer. Then you need to realloc it in a buffer even bigger and so the heap grows.

It looks like you're re-allocating the buffer on every append. Shouldn't you grow it only when you want to append more than it can hold?
When reallocating you want to increase the size of the buffer using a strategy that gives you the best trade off between the number of allocations and the amount of memory allocated. Just doubling the size of the buffer every time you hit the limit might not be ideal for an embedded program.

Generally for embedded applications it is much better to allocate a circular FIFO buffer 1-3 times the maximum message size.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight