I have a question: what's the time complexity of realloc function?
For example, I have an array of integers: a[10]. Of course, the array has been dynamically allocated in this way =>
int *a = (int*)malloc(10*sizeof(int));
And then I want to resize this array to 11 in order to insert an additional value to the array a, so I do =>
a = (int*)realloc(a, 11*sizeof(int));
My question is: What's the time complexity of reallocation? Does realloc simply add an additional cell into the array and then it takes O(1) or it recopies the whole array a, add the additional 11th cell and return the new-sized array and in this case the time complexity of this action is O(n)? Which assumption is true?
First, your code (in the original form of your question) is wrong. It should at least be
a = realloc(a, 11*sizeof(int));
(otherwise you probably have undefined behavior, in the usual case when sizeof(int) is greater than one). BTW realloc should not be casted in C.
Then, your question
What's the time complexity of realloc function in C?
Has no sense, unless you speak of some particular realloc function whose implementation you do know.
Notice that realloc is allowed to fail. Here is an efficient but useless and constant-time implementation of it (see also this answer; it is a standard conforming realloc which follows the letter, but not the spirit, of the standard n1570):
void *realloc( void *ptr, size_t new_size ) {
errno = ENOMEM;
return NULL;
}
At last, in practice, you could consider malloc and realloc as somehow expensive operations (typical runtime could be several microseconds, so thousands of time slower than an addition), and in many cases you don't want to realloc on every loop. So you'll rather use some geometrical growth for realloc (like here)
Does realloc simply add an additional cell into the array and then it takes O(1) or it recopies the whole array a, add the additional 11th cell and return the new-sized array and in this case the time complexity of this action is O(n)?
Both can happen. You could imagine that malloc keeps somewhere the allocated size (rounded up by the implementation...) and in good cases realloc returns the same pointer (so O(1)) and in less happy cases it need to copy the memory zone elsewhere. Without knowing your particular implementation (of malloc, realloc, free) you cannot know what is the most common case, or their probability distribution. BTW, it seems that a realloc implementation which always copy data (or fail) is conforming to the standard (but inefficient). If you need to know more, you should benchmark.
At last, many implementations of the C standard library are free software (e.g. GNU libc, musl-libc, ...) and you might study their source code of malloc, free, realloc (above operating system primitives -usually system calls- growing the virtual address space; on Linux, something like mmap(2))
realloc is equivalent to:
void *
realloc(void *old, size_t new_size)
{
size_t old_size = internal_function_that_knows_old_size(old);
void *new = malloc(new_size);
if (new == NULL)
return NULL;
size_t sz = old_size;
if (new_size < old_size)
sz = new_size;
memcpy(new, old, sz);
free(old);
return new;
}
It is possible for realloc to have optimizations that in some situations makes it faster, but I'm pretty sure that it's impossible to make those optimizations always work, so the fallback will always be a function that does something like the above, so you should consider this your worst case.
Now, when it comes to time complexity, there is nothing in the standard that requires malloc or free have any reasonable behavior so it is possible that they will dominate the runtime of this function (or internal_function_that_knows_old_size will), but since those bits of the system are usually quite well written that is unlikely. The dominant part (at least for large n, which is where complexity is interesting) will be the memcpy.
So with some reasonable assumptions realloc has to be O(n) (with n being the old or new size of the allocation whichever is smaller).
Related
I have a variable where the size is determined in run-time.
Is it generally better to realloc it every time a new element is added like:
array_type * someArray;
int counter = 0;
//before a new element is added:
counter ++;
someArray = realloc(someArray, counter * sizeof(array_type));
Or allocate more memory than probably needed using malloc once:
array_type * someArray = malloc(ENOUG_MEMORY * sizeof(array_type));
What is best in terms of efficiency(speed), readability and memory-management? And why?
realloc could occasionally be useful in the past, with certain allocation patterns in single-threaded code. Most modern memory managers are optimized for multi-threaded programs and to minimize fragmentation, so, when growing an allocation, a realloc will almost certainly just allocate a new block, copy the existing data, and then free the old block. So there's no real advantage to trying to use realloc.
Increasing the size one element at a time can create an O(n^2) situation. In the worst case, the existing data has to be copied each time. It would be better to increase the size in chunks (which is still O(n^2) but with a smaller constant factor) or to grow the allocation geometrically (which gives an O(n) amortized cost).
Furthermore, it's difficult to use realloc correctly.
someArray = realloc(someArray, counter * sizeof(array_type));
If the realloc fails, someArray is set to NULL. If that was your only copy of the pointer to the previously allocated memory, you've just lost it.
You won't be able to access the data you had already placed, and you can't free the original allocation, so you'll have a memory leak.
What is best in terms of efficiency(speed), readability and memory-management? And why?
There is no general best. Specific best depends on your specific application and use case and environment. You can throw wars over perfect realloc ratio and decide if you need realloc at all.
Remember about rules of optimization. You do not optimize. Then, you do not optimize, without measuring first. The best in any terms can be measured for your specific setup, your specific environment, your specific application that uses specific operating system and *alloc implementation.
what is the best practice?
Allocating a constant amount of memory (if it's small enough) is static. It's just an array. Refactor you application to just:
array_type someArray[ENOUGH_MEMORY];
If you do not want to over allocate (or ENOUGH_MEMORY is big enough), then use realloc to add one element, as presented.
If you want, "optimize" by not calling realloc that often and over allocating - it seems that ratio 1.5 is the most preferred from the linked thread above. Still, it's highly application specific - I would over allocate on Linux, I would not when working on STM32 or other bare metal.
I would use realloc with caution.
Calling realloc in general leads to:
allocating completely new block
copying all data from old to new location
releasing (freeing) the initial block.
All combined could be questionable from performance perspective, depending on the app, volume of data, response requirements.
In addition, in case of realloc failure, the return value is NULL which means that allocation to new block is not straightforward (indirection is required). E.g.
int *p = malloc(100 * sizeof *p);
if (NULL == p)
{
perror("malloc() failed");
return EXIT_FAILURE;
}
do_something_with_p(p);
/* Reallocate array to a new size
* Using temporary pointer in case realloc() fails. */
{
int *temp = realloc(p, 100000 * sizeof *temp);
if (NULL == temp)
{
perror("realloc() failed");
free(p);
return EXIT_FAILURE;
}
p = temp;
}
malloc vs realloc - what is the best practice?
Helper functions
When writing robust code, I avoid using library *alloc() functions directly. Instead I form helper functions to handle various use cases and take care of edge cases, parameter validation, etc.
Within these helper functions, I use malloc(), realloc(), calloc() as building blocks, perhaps steered by implementation macros, to form good code per the use case.
This pushes the "what is best" to a narrower set of conditions where it can be better assessed - per function. In the growing by 2x case, realloc() is fine.
Example:
// Optimize for a growing allocation
// Return new pointer.
// Allocate per 2x *nmemb * size.
// Update *nmemb_new as needed.
// A return of NULL implies failure, old not deallocated.
void *my_alloc_grow(void *ptr, size_t *nmemb, size_t size) {
if (nmemb == NULL) {
return NULL;
}
size_t nmemb_old = *nmemb;
if (size == 0) { // Consider array elements of size 0 as error
return NULL;
}
if (nmemb_old > SIZE_MAX/2/size)) {
return NULL;
}
size_t nmemb_new = nmemb_old ? (nmemb_old * 2) : 1;
unsigned char *ptr_new = realloc(ptr, nmemb_new * size);
if (ptr_new == NULL) {
return NULL;
}
// Maybe zero fill new memory portion.
memset(ptr_new + nmemb_old * size, 0, (nmemb_new - nmemb_old) * size);
*nmemb = nmemb_new;
return ptr_new;
}
Other use cases.
/ General new memory
void *my_alloc(size_t *nmemb, size_t size); // General new memory
void *my_calloc(size_t *nmemb, size_t size); // General new memory with zeroing
// General reallocation, maybe more or less.
// Act like free() on nmemb_new == 0.
void *my_alloc_resize(void *ptr, size_t *nmemb, size_t nmemb_new, size_t size);
If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?
Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.
The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.
The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.
Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.
I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).
A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).
realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.
I can't determine the array size in run time, because it will have dependencies. So I need to do memory allocation for an element. Just like linked-lists I can say.
How can I do that in integer arrays ?
I think you're saying you want to 'grow' an array one element at a time.
realloc() is an option in C for an array of integers(*).
size_t n=0;//Length of array.
int* arr=NULL;
//Some code...
//Now we want to grow our array
//This is probably in a loop or called in a function from a loop (not shown).
++n;
int* temp=realloc(arr,n*sizeof(int));
if(temp==NULL){
free(arr);
return 1;//Error...
}
arr=temp;
//More code....
//Now we're done with arr.
free(arr);
In the first pass arr is NULL and realloc(NULL,..) just acts like malloc().
But in subsequent passes when arr is not NULL then realloc attempts to grow the array 'in place' (if memory is available at the end of it). If not, it attempts to allocate space elsewhere and then copies over the existing data and frees the old location.
If it can't reallocate, it does nothing and returns NULL so you are responsible for then freeing that space (free(arr)).
NB: Always allocate the return of realloc() to a temporary variable (i.e. not the first argument).
Otherwise if allocation fails you have leaked the existing memory.
realloc is a quick trick to reduce the amount of allocating and copying required to grow an array - the alternative.
If that's not effective enough the normal model is to keep track of 'capacity' and 'length' something like this.
const size_t chunk=10;
size_t cap=0;
size_t n=0;
int* arr=NULL;
++n;
if(n>cap){
cap+=chunk;
int* temp=realloc(arr,cap);
if(temp==NULL){
free(arr);
return 1;//Error
}
}
//Later...
free(arr);
That has the advantage of you being able to tune how much 'spare' overhead you have to tune how often reallocation occurs. Another strategy is cap=cap*2; though obviously space tends to grow exponentially! A book could be written on strategies for this classic. It's often possible to allocate enough for most realistic cases in one hit but still handle exceptional cases.
You also have the option allocate back down to recover free space if you know the array is now full-sized.
In case it's not obvious, if the array is moved, any pointers to it or into it will have to be redirected. That's an argument for only indexing into it with [.]. You will need to keep hold of the old location until you do the redirection. ptr=temp+(ptr-arr) makes ptr point to the new location of the element ptr pointed to.
This is standard approach found in Java ArrayList and most implementations of std::vector<>. It's an engineering compromise. The benefits of an array (O(1) random access by index) and a linked list (O(1) growth).
It's asymptotically still linear but in practical cases provides great benefit. Computer Science is concerned with asymptotic rates and engineering with actual values in finite real cases.
(*) The copy is a raw copy of memory. That's fine for int but complex structures 'struct' may not be copyable by what amounts to memmove or take further effort.
If do the next:
int* array = malloc(10 * sizeof(int));
and them I use realloc:
array = realloc(array, 5 * sizeof(int));
On the second line (and only it), can it return NULL?
Yes, it can. There are no implementation guarantees on realloc(), and it can return a different pointer even when shrinking.
For example, if a particular implementation uses different pools for different object sizes, realloc() may actually allocate a new block in the pool for smaller objects and free the block in the pool for larger objects. Thus, if the pool for smaller objects is full, it will fail and return NULL.
Or it may simply decide it's better to move the block
I just used the following program to get size of actually allocated memory with glibc:
#include <stdlib.h>
#include <stdio.h>
int main()
{
int n;
for (n = 0; n <= 10; ++n)
{
void* array = malloc(n * sizeof(int));
size_t* a2 = (size_t*) array;
printf("%d -> %zu\n", n, a2[-1]);
}
}
and for n <= 6, it allocates 32 bytes, and for 7-10 it is 48.
So, if it shrank int[10] to int[5], the allocated size would shrink from 48 to 32, effectively giving 16 free bytes. Since (as it just has been noted) it won't allocate anything less than 32 bytes, those 16 bytes are lost.
If it moved the block elsewhere, the whole 48 bytes will be freed, and something could actually be put in there. Of course, that's just a science-fiction story and not a real implementation ;).
The most relevant quote from the C99 standard (7.20.3.4 The realloc function):
Returns
4 The realloc function returns a pointer to the new object (which may have the same value as a pointer to the old object), or a null pointer if the new object could not be allocated.
'May' is the key-word here. It doesn't mention any specific circumstances when that can happen, so you can't rely on any of them, even if they sound obvious at a first glance.
By the way, I think you could consider realloc() somewhat deprecated. If you'd take a look at C++, the newer memory allocation interfaces (new / delete and allocators) don't even support such a thing. They always expect you to allocate a new block. But that's just a loose comment.
The other answers have already nailed the question, but assuming you know the realloc call is a "trimming", you can wrap it with:
void *safe_trim(void *p, size_t n) {
void *p2 = realloc(p, n);
return p2 ? p2 : p;
}
and the return value will always point to an object of size n.
In any case, since the implementation of realloc knows the size of the object and can therefore determine that it's "trimming", it would be pathologically bad from a quality-of-implementation standpoint not to perform the above logic internally. But since realloc is not required to do this, you should do it yourself, either with the above wrapper or with analogous inline logic when you call realloc.
The language (and library) specification makes no such guarantee, just like it does not guarantee that a "trimming" realloc will preserve the pointer value.
An implementation might decide to implement realloc in the most "primitive" way: by doing an unconditional malloc for a new memory block, copying the data and free-ing the old block. Obviously, such implementation can fail in low-memory situations.
Don't count on it. The standard makes no such provision; it merely states "or a null pointer if the new object could not be allocated".
You'd be hard-pressed to find such an implementation, but according to the standard it would still be compliant.
I suspect there may be a theoretical possibility for failure in the scenario you describe.
Depending on the heap implementation, there may be no such a thing as trimming an existing allocation block. Instead a smaller block is allocated first, then the data is copied from the old one, and then it's freed.
For instance this may be the case with bucket-heap strategy (used by some popular heaps, such as tcmalloc).
A bit late, but there is at least one popular implementation which realloc() with a smaler size can fail: TCMalloc. (At least as far as i understand the code)
If you read the file tcmalloc.cc, in the function do_realloc_with_callback(), you will see that if you shrink enough (50% of alloced memory, otherwise it will be ignored), TCMalloc will alloc the new memory first (and possible fail) and then copy it and remove the old memory.
I do not copy the source code, because i am not sure if the copyrights (of TCMalloc and Stackoverflow) will allow that, but here is a link to the source (revision as at May 17, 2019).
realloc will not fails in shrinking the existing memory, so it will not return NULL. It can return NULL only if fails during expansion.
But shrinking can fail in some architecture, where realloc can be implemented in a different manner like allocating a smaller size memory separately and freeing the old memory to avoid fragmentation. In that case shrinking memory can return NULL. But its very rare implementation.
But its better to be in a safer side, to keep NULL checks after shrinking the memory also.
IMO one is enough, why does calloc require to split it into two arguments?
I'd guess that this is probably history and predates the times where C had prototypes for functions. At these times without a prototype the arguments basically had to be int, the typedef size_t probably wasn't even yet invented. But then INTMAX is the largest chunk you could allocate with malloc and splitting it up in two just gives you more flexibility and allows you to allocate really large arrays. Even at that times there were methods to obtain large pages from the system that where zeroed out by default, so efficiency was not so much a problem with calloc than for malloc.
Nowadays, with size_t and the function prototype at hand, this is just a daily reminder of the rich history of C.
The parameter names document it reasonably well:
void *malloc(size_t size);
void *calloc(size_t nelem, size_t elsize);
The latter form allows for neat allocating of arrays, by providing the number of elements and element size. The same behavior can be achieved with malloc, by multiplying.
However, calloc also initializes the allocated memory to 0. malloc does no init, so the value is undefined. malloc can be faster, in theory, due to not setting all the memory; this is only likely to be noted with large amounts.
In this question, it is suggested that calloc is clear-alloc and malloc is mem-alloc.