malloc()'s "perfect efficiency" VS automatic variables - c

I had the habit for a while to call malloc on anything. Then it dawned to me if there's no performance critical section of the code, why not use a couple of kilobytes more on an automatic and lose the accuracy of the amount of memory I need (potentially) of the malloc procedure? That way with no noticeable impact one can make much more readable code. e.g. copying temporarily a string for manipulating it in a function that is called very rarely.
Is my logic sound?

Local variables are stored on the stack, which is limited. malloc() allocates memory from the heap, which is also limited but contains far more memory.
I generally do not use malloc() unless the amount of memory would exceed what I could safely store on the stack.
For Windows development, the stacks are normally pretty large. You could store a buffer of up to a couple of hundred bytes without too much trouble (assuming the function would never be called recursively).
But, generally, if I need more than, say, 50 bytes, I would normally use malloc().

Most implementation's version of malloc() actually do not allocate the exact amount you specify but actually allocates more, usually in block-size increments. This gives a performance boost if you need to do some minor reallocation. So there was never really any "accuracy" there to begin with

I assume that you want to replace code like this:
malloc((foo * 2 + 6) * sizeof(char))
With
char big_enough[2000];
Regarding waste - there's nothing wrong with wasting a couple of bytes now and again, but if you do it all the time it will start to add up.
But a more serious danger is that you need to be sure that it's always going to be enough. Using a constant is dangerous - it might seem like 2000 bytes ought to be enough but are you sure that it's impossible for someone to need more? Remember that this sort of code can easily create buffer overflow vulnerabilities and the work you've saved in not calculating the correct size is probably less than the amount of extra work you now need to do to check that you don't overflow the buffer each time you read or write to it.

Related

optimal way of using malloc and realloc for dynamic storing

I'm trying to figure out what is the optimal way of using malloc and realloc for recieving unknown amount of characters from the user ,storing them, and printing them only by the end.
I've figured that calling realloc too many times wont be so smart.
so instead, I allocate a set amount of space each time,lets say
sizeof char*100
and by the end of file,i use realloc to fit the size of the whole thing precisely.
what do you think?is this a good way to go about?
would you go in a different path?
Please note,I have no intention of using linked lists,getchar(),putchar().
using malloc and realloc only is a must.
If you realloc to fit the exact amount of data needed, then you are optimizing for memory consumption. This will likely give slower code because 1) you get extra realloc calls and 2) you might not allocate amounts that fit well with CPU alignment and data cache. Possibly this also causes heap segmentation issues because of the repeated reallocs, in which case it could actually waste memory.
It's hard to answer what's "best" generically, but the below method is fairly common, as it is a good compromise between reducing execution speed for realloc calls and lowering memory use:
You allocate a segment, then keep track of how much of this segment that is user data. It is a good idea to allocate size_t mempool_size = n * _Alignof(int); bytes and it is probably also wise to use a n which is divisible by 8.
Each time you run out of free memory in this segment, you realloc to mempool_size*2 bytes. That way you keep doubling the available memory each time.
I've figured that calling realloc too many times wont be so smart.
How have you figured it out? Because the only way to really know is to measure the performance.
Your strategy may need to differ based on how you are reading the data from the user. If you are using getchar() you probably don't want to use realloc() to increase the buffer size by one char each time you read a character. However, a good realloc() will be much less inefficient than you think even in these circumstances. The minimum block size that glibc will actually give you in response to a malloc() is, I think, 16 bytes. So going from 0 to 16 characters and reallocing each time doesn't involve any copying. Similarly for larger reallocations, a new block might not need to be allocated, it may be possible to make the existing block bigger. Don't forget that even at its slowest, realloc() will be faster than a person can type.
Most people don't go for that strategy. What can by typed can be piped so the argument that people don't type very fast doesn't necessarily work. Normally, you introduce the concept of capacity. You allocate a buffer with a certain capacity and when it gets full, you increase its capacity (with realloc()) by adding a new chunk of a certain size. The initial size and the reallocation size can be tuned in various ways. If you are reading user input, you might go for small values e.g. 256 bytes, if you are reading files off disk or across the network, you might go for larger values e.g. 4Kb or bigger.
The increment size doesn't even need to be constant, you could choose to double the size for each needed reallocation. This is the strategy used by some programming libraries. For example the Java implementation of a hash table uses it I believe and so possibly does the Cocoa implementation of an array.
It's impossible to know beforehand what the best strategy in any particular situation is. I would pick something that feels right and then, if the application has performance issues, I would do testing to tune it. Your code doesn't have to be the fastest possible, but only fast enough.
However one thing I absolutely would not do is overlay a home rolled memory algorithm over the top of the built in allocator. If you find yourself maintaining a list of blocks you are not using instead of freeing them, you are doing it wrong. This is what got OpenSSL into trouble.

Growing an array on the stack

This is my problem in essence. In the life of a function, I generate some integers, then use the array of integers in an algorithm that is also part of the same function. The array of integers will only be used within the function, so naturally it makes sense to store the array on the stack.
The problem is I don't know the size of the array until I'm finished generating all the integers.
I know how to allocate a fixed size and variable sized array on the stack. However, I do not know how to grow an array on the stack, and that seems like the best way to solve my problem. I'm fairly certain this is possible to do in assembly, you just increment stack pointer and store an int for each int generated, so the array of ints would be at the end of the stack frame. Is this possible to do in C though?
I would disagree with your assertion that "so naturally it makes sense to store the array on the stack". Stack memory is really designed for when you know the size at compile time. I would argue that dynamic memory is the way to go here
C doesn't define what the "stack" is. It only has static, automatic and dynamic allocations. Static and automatic allocations are handled by the compiler, and only dynamic allocation puts the controls in your hands. Thus, if you want to manually deallocate an object and allocate a bigger one, you must use dynamic allocation.
Don't use dynamic arrays on the stack (compare Why is the use of alloca() not considered good practice?), better allocate memory from the heap using malloc and resize it using realloc.
Never Use alloca()
IMHO this point hasn't been made well enough in the standard references.
One rule of thumb is:
If you're not prepared to statically allocate the maximum possible size as a
fixed length C array then you shouldn't do it dynamically with alloca() either.
Why? The reason you're trying to avoid malloc() is performance.
alloca() will be slower and won't work in any circumstance static allocation will fail. It's generally less likely to succeed than malloc() too.
One thing is sure. Statically allocating the maximum will outdo both malloc() and alloca().
Static allocation is typically damn near a no-op. Most systems will advance the stack pointer for the function call anyway. There's no appreciable difference for how far.
So what you're telling me is you care about performance but want to hold back on a no-op solution? Think about why you feel like that.
The overwhelming likelihood is you're concerned about the size allocated.
But as explained it's free and it gets taken back. What's the worry?
If the worry is "I don't have a maximum or don't know if it will overflow the stack" then you shouldn't be using alloca() because you don't have a maximum and know it if it will overflow the stack.
If you do have a maximum and know it isn't going to blow the stack then statically allocate the maximum and go home. It's a free lunch - remember?
That makes alloca() either wrong or sub-optimal.
Every time you use alloca() you're either wasting your time or coding in one of the difficult-to-test-for arbitrary scaling ceilings that sleep quietly until things really matter then f**k up someone's day.
Don't.
PS: If you need a big 'workspace' but the malloc()/free() overhead is a bottle-neck for example called repeatedly in a big loop, then consider allocating the workspace outside the loop and carrying it from iteration to iteration. You may need to reallocate the workspace if you find a 'big' case but it's often possible to divide the number of allocations by 100 or even 1000.
Footnote:
There must be some theoretical algorithm where a() calls b() and if a() requires a massive environment b() doesn't and vice versa.
In that event there could be some kind of freaky play-off where the stack overflow is prevented by alloca(). I have never heard of or seen such an algorithm. Plausible specimens will be gratefully received!
The innards of the C compiler requires stack sizes to be fixed or calculable at compile time. It's been a while since I used C (now a C++ convert) and I don't know exactly why this is. http://gribblelab.org/CBootcamp/7_Memory_Stack_vs_Heap.html provides a useful comparison of the pros and cons of the two approaches.
I appreciate your assembly code analogy but C is largely managed, if that makes any sense, by the Operating System, which imposes/provides the task, process and stack notations.
In order to address your issue dynamic memory allocation looks ideal.
int *a = malloc(sizeof(int));
and dereference it to store the value .
Each time a new integer needs to be added to the existing list of integers
int *temp = realloc(a,sizeof(int) * (n+1)); /* n = number of new elements */
if(temp != NULL)
a = temp;
Once done using this memory free() it.
Is there an upper limit on the size? If you can impose one, so the size is at most a few tens of KiB, then yes alloca is appropriate (especially if this is a leaf function, not one calling other functions that might also allocate non-tiny arrays this way).
Or since this is C, not C++, use a variable-length array like int foo[n];.
But always sanity-check your size, otherwise it's a stack-clash vulnerability waiting to happen. (Where a huge allocation moves the stack pointer so far that it ends up in the middle of another memory region, where other things get overwritten by local variables and return addresses.) Some distros enable hardening options that make GCC generate code to touch every page in between when moving the stack pointer by more than a page.
It's usually not worth it to check the size and use alloc for small, malloc for large, since you also need another check at the end of your function to call free if the size was large. It might give a speedup, but this makes your code more complicated and more likely to get broken during maintenance if future editors don't notice that the memory is only sometimes malloced. So only consider a dual strategy if profiling shows this is actually important, and you care about performance more than simplicity / human-readability / maintainability for this particular project.
A size check for an upper limit (else log an error and exit) is more reasonable, but then you have to choose an upper limit beyond which your program will intentionally bail out, even though there's plenty of RAM you're choosing not to use. If there is a reasonable limit where you can be pretty sure something's gone wrong, like the input being intentionally malicious from an exploit, then great, if(size>limit) error(); int arr[size];.
If neither of those conditions can be satisfied, your use case is not appropriate for C automatic storage (stack memory) because it might need to be large. Just use dynamic allocation autom don't want malloc.
Windows x86/x64 the default user-space stack size is 1MiB, I think. On x86-64 Linux it's 8MiB. (ulimit -s). Thread stacks are allocated with the same size. But remember, your function will be part of a chain of function calls (so if every function used a large fraction of the total size, you'd have a problem if they called each other). And any stack memory you dirty won't get handed back to the OS even after the function returns, unlike malloc/free where a large allocation can give back the memory instead of leaving it on the free list.
Kernel thread stack are much smaller, like 16 KiB total for x86-64 Linux, so you never want VLAs or alloca in kernel code, except maybe for a tiny max size, like up to 16 or maybe 32 bytes, not large compared to the size of a pointer that would be needed to store a kmalloc return value.

What is a good size for medium sized memory allocations?

For a serializing system, I need to allocate buffers to write data into. The size needed is not known in advance, so the basic pattern is to malloc N bytes and use realloc if more is needed. The size of N would be large enough to accommodate most objects, making reallocation rare.
This made me think that there is probably an optimal initial amount of bytes that malloc can satisfy more easily than others. I'm guessing somewhere close to pagesize, although not necessarily exactly if malloc needs some room for housekeeping.
Now, I'm sure it is a useless optimization, and if it really mattered, I could use a pool, but I'm curious; I can't be the first programmer to think give me whatever chunk of bytes is easiest to allocate as a start. Is there a way to determine this?
Any answer for this that specifically applies to modern GCC/G++ and/or linux will be accepted.
From reading this wiki page it seems that your answer would vary wildly depending on the implementation of malloc you're using and the OS. Reading the bit on OpenBSD's malloc is particularly interesting. It sounds like you want to look at mmap, too, but at a guess I'd say allocating the default pagesize (4096?) would be optimised for.
My suggestion to you would be to find an appropriate malloc/realloc/free source code such that you can implement your own "malloc_first" alongside the others in the same source module (and using the same memory structures) which simply allocates and returns the first available block greater than or equal to a passed minimum_bytes parameter. If 0 is passed you'll get the first block period.
An appropriate declaration could be
void *malloc_first (size_t minimum_bytes, size_t *actual_bytes);
How doable such an undertaking would be I don't know. I suggest you attempt it using Linux where all source codes are available.
The way it's done in similar cases is for the first malloc to allocate some significant but not too large chunk, which would suit most cases (as you described), and every subsequent realloc call to double the requested size.
So, if at first you allocate 100, next time you'll realloc 200, then 400, 800 and so on. In this way the chances of subsequent reallocation will be lower after each time you do it.
If memory serves me right, that's how std::vector behaves.
after edit
The optimal initial allocation size would be the one that will cover most of your cases on one side, but won't be too wasteful on the other side. If your average case is 50, but can spike to 500, you'll want to allocate initially 50, and then double or triple (or multiple by 10) every next realloc so that you could get to 500 in 1-3 reallocs, but any further reallocs would be unlikely and infrequent. So it depends on your usage patterns, basically.

Is it better to allocate memory in the power of two?

When we use malloc() to allocate memory, should we give the size which is in power of two? Or we just give the exact size that we need?
Like
//char *ptr= malloc( 200 );
char *ptr= malloc( 256 );//instead of 200 we use 256
If it is better to give size which is in the power of two, what is the reason for that? Why is it better?
Thanks
Edit
The reason of my confusion is following quote from Joel's blog Back to Basics
Smart programmers minimize the
potential distruption of malloc by
always allocating blocks of memory
that are powers of 2 in size. You
know, 4 bytes, 8 bytes, 16 bytes,
18446744073709551616 bytes, etc. For
reasons that should be intuitive to
anyone who plays with Lego, this
minimizes the amount of weird
fragmentation that goes on in the free
chain. Although it may seem like this
wastes space, it is also easy to see
how it never wastes more than 50% of
the space. So your program uses no
more than twice as much memory as it
needs to, which is not that big a
deal.
Sorry, I should have posted the above quote earlier. My apologies!
Most replies, so far, say that allocating memory in the power of two is a bad idea, then in which scenario its better to follow Joel's point about malloc()? Why did he say that? Is the above quoted suggestion obsolete now?
Kindly explain it.
Thanks
Just give the exact size you need. The only reason that a power-of-two size might be "better" is to allow quicker allocation and/or to avoid memory fragmentation.
However, any non-trivial malloc implementation that concerns itself with being efficient will internally round allocations up in this way if and when it is appropriate to do so. You don't need to concern yourself with "helping" malloc; malloc can do just fine on its own.
Edit:
In response to your quote of the Joel on Software article, Joel's point in that section (which is hard to correctly discern without the context that follows the paragraph that you quoted) is that if you are expecting to frequently re-allocate a buffer, it's better to do so multiplicatively, rather than additively. This is, in fact, exactly what the std::string and std::vector classes in C++ (among others) do.
The reason that this is an improvement is not because you are helping out malloc by providing convenient numbers, but because memory allocation is an expensive operation, and you are trying to minimize the number of times you do it. Joel is presenting a concrete example of the idea of a time-space tradeoff. He's arguing that, in many cases where the amount of memory needed changes dynamically, it's better to waste some space (by allocating up to twice as much as you need at each expansion) in order to save the time that would be required to repeatedly tack on exactly n bytes of memory, every time you need n more bytes.
The multiplier doesn't have to be two: you could allocate up to three times as much space as you need and end up with allocations in powers of three, or allocate up to fifty-seven times as much space as you need and end up with allocations in powers of fifty-seven. The more over-allocation you do, the less frequently you will need to re-allocate, but the more memory you will waste. Allocating in powers of two, which uses at most twice as much memory as needed, just happens to be a good starting-point tradeoff until and unless you have a better idea of exactly what your needs are.
He does mention in passing that this helps reduce "fragmentation in the free chain", but the reason for that is more because of the number and uniformity of allocations being done, rather than their exact size. For one thing, the more times you allocate and deallocate memory, the more likely you are to fragment the heap, no matter in what size you're allocating. Secondly, if you have multiple buffers that you are dynamically resizing using the same multiplicative resizing algorithm, then it's likely that if one resizes from 32 to 64, and another resizes from 16 to 32, then the second's reallocation can fit right where the first one used to be. This wouldn't be the case if one resized from 25 to 60 and and the other from 16 to 26.
And again, none of what he's talking about applies if you're going to be doing the allocation step only once.
Just to play devil's advocate, here's how Qt does it:
Let's assume that we append 15000
characters to the QString string. Then
the following 18 reallocations (out of
a possible 15000) occur when QString
runs out of space: 4, 8, 12, 16, 20,
52, 116, 244, 500, 1012, 2036, 4084,
6132, 8180, 10228, 12276, 14324,
16372. At the end, the QString has 16372 Unicode characters allocated,
15000 of which are occupied.
The values above may seem a bit
strange, but here are the guiding
principles:
QString allocates 4 characters at a
time until it reaches size 20. From 20
to 4084, it advances by doubling the
size each time. More precisely, it
advances to the next power of two,
minus 12. (Some memory allocators
perform worst when requested exact
powers of two, because they use a few
bytes per block for book-keeping.)
From 4084 on, it advances by blocks of
2048 characters (4096 bytes). This
makes sense because modern operating
systems don't copy the entire data
when reallocating a buffer; the
physical memory pages are simply
reordered, and only the data on the
first and last pages actually needs to
be copied.
I like the way they anticipate operating system features in code that is meant to perform well from smartphones to server farms. Given that they're smarter people than me, I'd assume that said feature is available in all modern OSes.
It might have been true once, but it's certainly not better.
Just allocate the memory you need, when you need it and free it up as soon as you've finished.
There are far too many programs that are profligate with resources - don't make yours one of them.
It's somewhat irrelevant.
Malloc actually allocates slightly more memory than you request, because it has it's own headers to deal with. Therefore the optimal storage is probably something like 4k-12 bytes... but that varies depending on the implementation.
In any case, there is no reason for you to round up to more storage than you need as an optimization technique.
You may want to allocate memory in terms of the processor's word size; not any old power of 2 will do.
If the processor has a 32-bit word (4 bytes), then allocate in units of 4 bytes. Allocating in terms of 2 bytes may not be helpful since the processor prefers data to start on a 4 byte boundary.
On the other hand, this may be a micro-optimization. Most memory allocation libraries are set up to return memory that is aligned at the correct position and will leave the least amount of fragmentation. If you allocate 15 bytes, the library may pad out and allocate 16 bytes. Some memory allocators have different pools based on the allocation size.
In summary, allocate the amount of memory that you need. Let the allocation library / manager handle the actual amount for you. Put more energy into correctness and robustness than worry about these trivial issues.
When I'm allocating a buffer that may need to keep growing to accommodate as-yet-unknown-size data, I start with a power of 2 minus 1, and every time it runs out of space, I realloc with twice the previous size plus 1. This makes it so I never have to worry about integer overflows; the size can only overflow when the previous size was SIZE_MAX, at which point the allocation would already have failed, and 2*SIZE_MAX+1 == SIZE_MAX anyway.
In contrast, if I just used a power of 2 and doubled it each time, I might successfully get a 2^31 byte buffer and then reallocate to a 0 byte buffer next time I doubled the size.
As some people have commented about power-of-2-minus-12 being good for certain malloc implementations, one could equally start with a power of 2 minus 12, then double it and add 12 at each step...
On the other hand if you're just allocating small buffers that won't need to grow, request exactly the size you need. Don't try to second-guess what's good for malloc.
This is totally dependent on the given libc implementation of malloc(3). It's up to that implementation to reserve heap chunks in whatever order it sees fit.
To answer the question - no, it's not "better" (here by "better" you mean ...?). If the size you ask for is too small, malloc(3) will reserve bigger chunk internally, so just stick with your exact size.
With today's amount of memory and its speed I don't think it's relevant anymore.
Furthermore, if you're gonna allocate memory frequently you better consider custom memory pooling / pre-allocation.
There is always testing...
You can try a "sample" program that allocates memory in a loop. This way you can see if your compiler magically allocates memory in powers of 2.
With that information, you can try to allocate the same amount of total memory using the 2 strategies: random sized blocks and power of 2 sized blocks.
I would only expect differences, if any, for large amounts of memory though.
If you're allocating some sort of expandable buffer where you need to pick some number for initial allocations, then yes, powers of 2 are good numbers to choose. If you need to allocate memory for struct foo, then just malloc(sizeof(struct foo)). The recommendation for power-of-2 allocations stems from the inefficiency of internal fragmentation, but modern malloc implementations intended for multiprocessor systems are starting to use CPU-local pools for allocations small enough for this to matter, which prevents the lock contention that used to result when multiple threads would attempt to malloc at the same time, and spend more time blocked due to fragmentation.
By allocating only what you need, you ensure that data structures are packed more densely in memory, which improves cache hit rate, which has a much bigger impact on performance than internal fragmentation. There exist scenarios with very old malloc implementations and very high-end multiprocessor systems where explicitly padding allocations can provide a speedup, but your resources in that case would be better spent getting a better malloc implementation up and running on that system. Pre-padding also makes your code less portable, and prevents the user or the system selecting the malloc behavior at run-time, either programmatically or with environment variables.
Premature optimization is the root of all evil.
You should use realloc() instead of malloc() when reallocating.
http://www.cplusplus.com/reference/clibrary/cstdlib/realloc/
Always use a power of two? It depends on what your program is doing. If you need to reprocess your whole data structure when it grows to a power of two, yeah it makes sense. Otherwise, just allocate what you need and don't hog memory.

Is it bad practice to declare an array mid-function

in an effort to only ask what I'm really looking for here... I'm really only concerned if it's considered bad practice or not to declare an array like below where the size could vary. If it is... I would generally malloc() instead.
void MyFunction()
{
int size;
//do a bunch of stuff
size = 10; //but could have been something else
int array[size];
//do more stuff...
}
Generally yes, this is bad practice, although new standards allow you to use this syntax. In my opinion you must allocate (on the heap) the memory you want to use and release it once you're done with it. Since there is no portable way of checking if the stack is enough to hold that array you should use some methods that can really be checked - like malloc/calloc & free. In the embedded world stack size can be an issue.
If you are worried about fragmentation you can create your own memory allocator, but this is a totally different story.
That depends. The first clearly isn't what I'd call "proper", and the second is only under rather limited circumstances.
In the first, you shouldn't cast the return from malloc in C -- doing so can cover up the bug of accidentally omitting inclusion of the correct header (<stdlib.h>).
In the second, you're restricting the code to C99 or a gcc extension. As long as you're aware of that, and it works for your purposes, it's all right, but hardly what I'd call an ideal of portability.
As far as what you're really asking: with the minor bug mentioned above fixed, the first is portable, but may be slower than you'd like. If the second is portable enough for your purposes, it'll normally be faster.
For your question, I think each has its advantages and disadvantages.
Dynamic Allocation:
Slow, but you can detect when there is no memory to be given to your programmer by checking the pointer.
Stack Allocation:
Only in C99 and it is blazingly fast but in case of stackoverflow you are out of luck.
In summary, when you need a small array, reserve it on the stack. Otherwise, use dynamic memory wisely.
The argument against VLAs runs that because of the absolute badness of overflowing the stack, by the time you've done enough thinking/checking to make them safe, you've done enough thinking/checking to use a fixed-size array:
1) In order to safely use VLAs, you must know that there is enough stack available.
2) In the vast majority of cases, the way that you know there's enough stack is that you know an upper bound on the size required, and you know (or at least are willing to guess or require) a lower bound on the stack available, and the one is smaller than the other. So just use a fixed-size array.
3) In the vast majority of the few cases that aren't that simple, you're using multiple VLAs (perhaps one in each call to a recursive function), and you know an upper bound on their total size, which is less than a lower bound on available stack. So you could use a fixed-size array and divide it into pieces as required.
4) If you ever encounter one of the remaining cases, in a situation where the performance of malloc is unacceptable, do let me know...
It may be more convenient, from the POV of the source code, to use VLAs. For instance you can use sizeof (in the defining scope) instead of maintaining the size in a variable, and that business with dividing an array into chunks might require passing an extra parameter around. So there's some small gain in convenience, sometimes.
It's also easier to miss that you're using a humongous amount of stack, yielding undefined behavior, if instead of a rather scary-looking int buf[1920*1024] or int buf[MAX_IMG_SIZE] you have an int buf[img->size]. That works fine right up to the first time you actually handle a big image. That's broadly an issue of proper testing, but if you miss some possible difficult inputs, then it won't be the first or last test suite to do so. I find that a fixed-size array reminds me either to put in fixed-size checks of the input, or to replace it with a dynamic allocation and stop worrying whether it fits on the stack or not. There is no valid option to put it on the stack and not worry whether it fits...
two points from a UNIX/C perspective -
malloc is only slow when you force it to call brk(). Meaning for reasonable arrays it is the same as allocating stack space for a variable. By the way when you use method #2 (via alloca and in the code libc I have seen) also invokes brk() for huge objects. So it is a wash. Note: with #2 & #1 you still have to invoke directly or indirectly a memset-type call to zero the bytes in the array. This is just a side note to the real issue (IMO):
The real issue is memory leaks. alloca cleans up after itself when the function retruns so #2 is less likely to cause a problem. With malloc/calloc you have to call free() or start a leak.

Resources