Fastest way possible to allocate an array of strings - c

I have a function which takes an array of strings (buffer) and needs to increase its size.
So I invoke a realloc
temp = (char**) realloc (buffer, newSize * (sizeof(char*)));
if (temp == NULL)
return false;
else
buffer = temp;
And thus far everything is fine. Now for every new cell I must invoke a malloc with the correct size. Notice that newSize is always even and that odd strings have a different length than even ones.
for (i = oldSize; i < newSize; i++){
support = (char*) malloc (LENGTH1 * sizeof(char));
if (support == NULL){
marker = i;
failedMalloc = true;
break;
}
else
buffer[i] = support;
i++;
support = (char*) malloc (LENGTH2 * sizeof(char));
if (support == NULL){
marker = i;
failedMalloc = true;
break;
}
else
buffer[i] = support;
}
The fact is that since I work with huge data sooner or later I'll finish memory and the realloc or one of the mallocs will fail. The problem is that if it's one of the mallocs the one that fails there is the risk that I'll have to invoke millions of free to clear up some memory. This takes a lot of time. Is there any way to speedup this process or even better avoid it?
if (failedMalloc){
for (i = oldRows; i < marker; i++)
free(buffer[i]);
temp = (char**) realloc (buffer, oldRows * (sizeof(char*)));
}
PS: Yes I know that pointer arithmetic is faster than array indexing. I will implement it when I find a way to solve this problem, for the moment I prefer using array indexing because I find it less error prone. But the final version will use pointer arithmetic

Instead of allocating each string individually, allocate them in blocks. You could for example malloc 128*(LENGTH1+LENGTH2) and have room for 256 consecutive strings. Whenever your index crosses a block boundary, malloc another big block and use modulo arithmetic to get an offset into it for the start of the string.
P.S. sizeof(char) is guaranteed to be 1.

Allocate larger blocks of memory. The less malloc calls, the better. The fastest will be to precalculate the required size and allocate only once.
Also, using pointer arithmetic will not produce any visible difference here.

You could write your own allocation and deallocation routines, and use them instead of malloc/free for the strings. If your routines malloc one or more big buffers and portion out little bits of it, then you can free the whole lot in one go just by calling free on each big buffer.
The general idea works especially well in the case where all allocations are the same size, in which case it's called a "pool allocator". In this case, for each array or strings you could have one associated pool for the LENGTH1 allocations, and another for the LENGTH2.
I say, "write your own", but no doubt there are simple open-source pool allocators out there for the taking.

One way to avoid waste memory is to malloc larger memory each time, when you need to malloc,
malloc fixed size(align to 2^n), e.g.
int m_size = 1;
// when need malloc
while (m_size < require_size) m_size * 2;
malloc(m_size);

Related

malloc vs realloc - what is the best practice?

I have a variable where the size is determined in run-time.
Is it generally better to realloc it every time a new element is added like:
array_type * someArray;
int counter = 0;
//before a new element is added:
counter ++;
someArray = realloc(someArray, counter * sizeof(array_type));
Or allocate more memory than probably needed using malloc once:
array_type * someArray = malloc(ENOUG_MEMORY * sizeof(array_type));
What is best in terms of efficiency(speed), readability and memory-management? And why?
realloc could occasionally be useful in the past, with certain allocation patterns in single-threaded code. Most modern memory managers are optimized for multi-threaded programs and to minimize fragmentation, so, when growing an allocation, a realloc will almost certainly just allocate a new block, copy the existing data, and then free the old block. So there's no real advantage to trying to use realloc.
Increasing the size one element at a time can create an O(n^2) situation. In the worst case, the existing data has to be copied each time. It would be better to increase the size in chunks (which is still O(n^2) but with a smaller constant factor) or to grow the allocation geometrically (which gives an O(n) amortized cost).
Furthermore, it's difficult to use realloc correctly.
someArray = realloc(someArray, counter * sizeof(array_type));
If the realloc fails, someArray is set to NULL. If that was your only copy of the pointer to the previously allocated memory, you've just lost it.
You won't be able to access the data you had already placed, and you can't free the original allocation, so you'll have a memory leak.
What is best in terms of efficiency(speed), readability and memory-management? And why?
There is no general best. Specific best depends on your specific application and use case and environment. You can throw wars over perfect realloc ratio and decide if you need realloc at all.
Remember about rules of optimization. You do not optimize. Then, you do not optimize, without measuring first. The best in any terms can be measured for your specific setup, your specific environment, your specific application that uses specific operating system and *alloc implementation.
what is the best practice?
Allocating a constant amount of memory (if it's small enough) is static. It's just an array. Refactor you application to just:
array_type someArray[ENOUGH_MEMORY];
If you do not want to over allocate (or ENOUGH_MEMORY is big enough), then use realloc to add one element, as presented.
If you want, "optimize" by not calling realloc that often and over allocating - it seems that ratio 1.5 is the most preferred from the linked thread above. Still, it's highly application specific - I would over allocate on Linux, I would not when working on STM32 or other bare metal.
I would use realloc with caution.
Calling realloc in general leads to:
allocating completely new block
copying all data from old to new location
releasing (freeing) the initial block.
All combined could be questionable from performance perspective, depending on the app, volume of data, response requirements.
In addition, in case of realloc failure, the return value is NULL which means that allocation to new block is not straightforward (indirection is required). E.g.
int *p = malloc(100 * sizeof *p);
if (NULL == p)
{
perror("malloc() failed");
return EXIT_FAILURE;
}
do_something_with_p(p);
/* Reallocate array to a new size
* Using temporary pointer in case realloc() fails. */
{
int *temp = realloc(p, 100000 * sizeof *temp);
if (NULL == temp)
{
perror("realloc() failed");
free(p);
return EXIT_FAILURE;
}
p = temp;
}
malloc vs realloc - what is the best practice?
Helper functions
When writing robust code, I avoid using library *alloc() functions directly. Instead I form helper functions to handle various use cases and take care of edge cases, parameter validation, etc.
Within these helper functions, I use malloc(), realloc(), calloc() as building blocks, perhaps steered by implementation macros, to form good code per the use case.
This pushes the "what is best" to a narrower set of conditions where it can be better assessed - per function. In the growing by 2x case, realloc() is fine.
Example:
// Optimize for a growing allocation
// Return new pointer.
// Allocate per 2x *nmemb * size.
// Update *nmemb_new as needed.
// A return of NULL implies failure, old not deallocated.
void *my_alloc_grow(void *ptr, size_t *nmemb, size_t size) {
if (nmemb == NULL) {
return NULL;
}
size_t nmemb_old = *nmemb;
if (size == 0) { // Consider array elements of size 0 as error
return NULL;
}
if (nmemb_old > SIZE_MAX/2/size)) {
return NULL;
}
size_t nmemb_new = nmemb_old ? (nmemb_old * 2) : 1;
unsigned char *ptr_new = realloc(ptr, nmemb_new * size);
if (ptr_new == NULL) {
return NULL;
}
// Maybe zero fill new memory portion.
memset(ptr_new + nmemb_old * size, 0, (nmemb_new - nmemb_old) * size);
*nmemb = nmemb_new;
return ptr_new;
}
Other use cases.
/ General new memory
void *my_alloc(size_t *nmemb, size_t size); // General new memory
void *my_calloc(size_t *nmemb, size_t size); // General new memory with zeroing
// General reallocation, maybe more or less.
// Act like free() on nmemb_new == 0.
void *my_alloc_resize(void *ptr, size_t *nmemb, size_t nmemb_new, size_t size);

Input string without knowing the size

What's the way when i want to store string that i don't know the size.
I do like this:
#include <stdio.h>
#include <conio.h>
int main () {
char * str;
str = (char *)malloc(sizeof(char) + 1);
str[1] = '\0';
int i = 0;
int c = '\0';
do {
c = getche();
if(c != '\r'){
str[i] = c;
str[i + 1] = '\0';
i++;
str = (char *)realloc(str, sizeof(char) + i + 2);
}
} while(c != '\r');
printf("\n%s\n", str);
free(str);
return 0;
}
I find this page:
Dynamically prompt for string without knowing string size
Is it correct? If it is, then:
Is there any better way?
Is there more efficient way?
Is it correct?
No
The main problem is the use of realloc. It is just wrong. When using realloc never directly assign to the pointer that points to the already allocated memory - always use a temporary to take the return value. Like:
char * temp;
temp = realloc(str, 1 + i + 2);
if (temp == NULL)
{
// out of memory
.. add error handling
}
str = temp;
The reason for this is that realloc may fail in which case it will return NULL. So if you assign directly to str and realloc fails, you have lost the pointer to the allocated memory (aka the string).
Besides that:
1) Don't cast malloc and realloc
2) sizeof(char) is always 1 - so you don't need to use it - just put 1
Is there any better way?
Is there more efficient way?
Instead of reallocating by 1 in each loop - which is pretty expensive performance wise - it is in many cases better to (re)allocate a bigger chunk.
One strategy is to double the allocation whenever calling realloc. So if you have allocated 128 bytes the next allocation should be 2*128=256. Another strategy is to let it grow with some fixed size which is significantly bigger than 1 - for instance you could let it grow with 1024 each time.
I suggest using a buffer to avoid repeated realloc calls. Create a buffer or arbitary size e.g. 1024 when it fills up you can realloc more space to your dynamically allocated buffer and memmove the buffer into it.
The key to answering this question is to clarify the term "without knowing the size".
We may not know what amount of data we're going to get, but we may know what we're going to do with it.
Let us consider the following use cases:
We have restrictions on the data we need, for example: a person's name, an address, the title of a book. I guess we are good with 1k or a maximum of 16k of space.
We obtain a continuous flow of data, for example: some sensor or other equipment sends us data every second. In this case, we could process the data in chunks.
Answer:
We need to make an educated guess about the size we intend to process and allocate space accordingly.
We have to process data on the fly and we need to release the space that is no longer required.
Note:
It is important to note, that we can't allocate unlimited size of memory. On some point we have to implement error handling and/or we need to store the data on 'disk' or somewhere else.
Note II:
In case a more memory efficient solution is needed, using realloc is not recommended as it can duplicate the allocated size (if the system cannot simply increase the allocated space, it first allocates a new block of memory and copies the current contents) while running. Instead, an application-specific memory structure would be required. But I assume that is beyond the scope of the original question.
Is it correct?
Sort of.
We don't cast the result of malloc() in C.
Is there any better way?
That's primarily opinion-based.
Is there more efficient way?
With regards to time or space?
If you are asking about space, no.
If you are asking about time, yes.
You could dynamically allocate memory for an array with a small size, that would hold the string for a while. Then, when the array would not be able to hold the string any longer, you would reallocate that memory and double its size. And so on, until the whole string is read. When you are done, you could reallocate the memory again, and shrink the size to be the exact number you need for your string.
You see, calling realloc(), is costly in time, since it may have to move a whole memory block, since the memory must be contiguous, and there might not be any space left to perform that operation without moving the memory related to the string.
Note: Of course, a fixed sized array, statically created would be better in terms of time, but worse in terms of memory. Everything is a trade off, that's where you come into play and decide what best suits your application.
How about this?
char *string_name;
asprintf(&string_name, "Hello World, my name is %s & I'm %d years old!", "James Bond", 27);
printf("string is %s", string_name);
free(string_name);

C - Segmentation Fault when using malloc with large size paramater

I'm running into a segfault when allocating a dynamic array of large size.
As a specific example, the below code causes a segfault.
int max = 1399469912;
int *arr = (int*) malloc((max+1) * sizeof(int));
arr[0] = 1;
However, if I replace max with something smaller like 5, then I get no segfault.
Why does this happen? Or, is there another solution to achieve the same effect? I need a dynamically allocated array of significant size.
Thanks
Reed documentation of malloc (or malloc(3) from Linux man page)
It can fail, and then returns NULL; and your code should handle that case:
int *arr = malloc((max+1) * sizeof(int));
if (!arr) { perror("malloc arr"); exit(EXIT_FAILURE); };
You could handle the failure in some other ways, but errno is giving the reason. In practice, recovering a malloc or calloc failure is quite tricky. In most cases, exiting abruptly like above is the simplest thing to do. In some cases (think of a server program which should run continuously) you can do otherwise (but that is difficult).
Read also about memory overcommitment (a whole system configurable thing or feature, that I personally dislike and disable, because it might make malloc apparently succeed when memory resources are exhausted; on Linux read about Out-of-memory killer)
See also this (a silly implementation of malloc)
BTW, you need to be sure that (max+1) * sizeof(int) is not overflowing, and you'll better define size_t max = 1399469912; (not int).
Notice that you are requesting (on systems having sizeof(int)==4 like my Linux/x86-64 desktop) more than five gigabytes. This is a significant amount of virtual address space.
You have to check whether malloc return a valid pointer or fails to allocate the memory. In that case, it returns a null. pointer.
int max = 1399469912;
int *arr = (int*) malloc((max+1) * sizeof(int));
if( arr == NULL )
{
/* Malloc failed, deal with it */
}else{
//fine here
arr[0] = 1;
}
Quoting the man page
If successful, calloc(), malloc(), realloc(), reallocf(), and valloc()
functions return a pointer to allocated memory. If there is an error,
they return a NULL pointer and set errno to ENOMEM.
malloc() returns NULL if it wasn't able to allocate the requested memory. You should check the value returned by malloc() against NULL.

Deleting dynamically allocated array members in C

Lets say I have an array of structs and I want to delete an entry that has a struct with an entry matching some criteria.
This array is dynamically allocated with malloc, I keep the element count in a separate variable.
How do I go about deleting the entry?
I'm thinking of
for (i = pos; i < arr_len; i++) {
arr[i] = arr[i+1];
}
arr_len--;
But this leaves the same amount of memory for the array while I actually need less and an orphan (sort of) last entry.
Is issuing a realloc in such situation an accepted practice? Would realloc do memcpy in this case? (shortening the allocated memory by one block).
realloc is ok ... but keep reading :)
realloc will not move parts of the memory; it may move the whole block. So you need to copy the data before changing the allocated size.
To move the data, memmove (not memcpy) is a good option: it works for memory areas that belong to the same object. Pay attention to not go over your array limits, though; like you do in your code.
for (i = pos; i < arr_len; i++) {
arr[i] = arr[i+1];
}
The arr[i] = arr[i + 1]; will try to access one past the allowed size. You need
for (i = pos + 1; i < arr_len; i++) {
arr[i - 1] = arr[i];
}
There is somewhat of an overhead when calling realloc. If your structs are not large and/or they live only for a short while, consider keeping both an element count and allocated count and only realloc to enlarge (when (element_count + 1) > (allocated_count)).
If the struct is large, also consider a different data structure (linked list perhaps).
Calling realloc to shrink the allocated memory would not necessarily be a bad idea.
However, you might have to reconsider the data structure you are using. It looks like a linked list would make it much easier to manage memory and make the delete operation much faster since it doesn't require shifting elements.
Using realloc would be appropriate here. It wouldn't do a memcpy -- that's only necessary when the realloc size is larger and there's no room to expand.

How can I free all allocated memory at once?

Here is what I am working with:
char* qdat[][NUMTBLCOLS];
char** tdat[];
char* ptr_web_data;
// Loop thru each table row of the query result set
for(row_index = 0; row_index < number_rows; row_index++)
{
// Loop thru each column of the query result set and extract the data
for(col_index = 0; col_index < number_cols; col_index++)
{
ptr_web_data = (char*) malloc((strlen(Data) + 1) * sizeof(char));
memcpy (ptr_web_data, column_text, strlen(column_text) + 1);
qdat[row_index][web_data_index] = ptr_web_data;
}
}
tdat[row_index] = qdat[col_index];
After the data is used, the memory allocated is released one at a time using free().
for(row_index = 0; row_index < number_rows; row_index++)
{
// Loop thru all columns used
for(col_index = 0; col_index < SARWEBTBLCOLS; col_index++)
{
// Free memory block pointed to by results set array
free(tdat[row_index][col_index]);
}
}
Is there a way to release all the allocated memory at once, for this array?
Thank You.
Not with the standard malloc() allocator - you need to investigate the use of memory pools. These work by allocating a big block of memory up-front, allocating from it and freeing back to it as you request via their own allocation functions, and then freeing the whole lot with a special "deallocate all" function.
I must say I've always found these things a bit ugly - it really isn't that hard to write code that doesn't leak. The only reason I can see for using them is to mitigate heap fragmentation, if that is a real problem for you.
No there is not. Memory which is separately allocated must be separately freed.
The only way you could free it as once is if you allocated it at once is a giant block. You would then have to do a bit of pointer math to assign every row the correct index into the array but it's not terribly difficult. This approach does have a few downsides though
Extra pointer math
Requires one giant contiguous block of memory vs. N smaller blocks of memory. Can be an issue in low memory or high fragmentation environments.
Extra work for no real stated gain.
If you want to release it all at once, you have to allocate it all at once.
A simple manual solution, if you know the total size you'll need in advance, is to allocate it all in once chunk and index into it as appropriate. If you don't know the size in advance you can use realloc to grow the memory, so long as you only access it indexed from the initial pointer, and don't store additional pointers anywhere.
That being said, direct allocation and deallocation is a simple solution, and harder to get wrong than the alternatives. Unless the loop to deallocate is causing you real difficulties, I would stick with what you have.

Resources