Array of strings memory reallocation in C

Array of strings memory reallocation in C - arrays

I have a project to do and I need to reallocate the memory for the array. I tried many times, but I have found just one version that works.
void resize(int* size, char*** arr)
{
*size *= 2;
*arr = realloc(*arr, sizeof(char*) * *size);
nulltest(*arr); // This checks if the allocation was successful
}
resize(&size, &arr);
This function doubles the capacity of the array, but I think it's done too complicated, so I want to ask you guys, if this can be simplified or not. I'm open to new solutions too.

int is not the good type for sizes. Use the correct one size_t
Try to avoid side effects (ie modifying the objects passed by reference) without special need.
Your code is potentially leaking memory. If realloc failes you loose the reference to the previously allocated memory. The integer referenced by size is also changed without the need.
I would implement it this way:
char **resize(char **arr, size_t newsize)
{
return realloc(arr, size * sizeof(*arr));
}
And the caller function should check for the errors.
But what is the point of defining this function? This one only returns the return value of realloc.

Related

malloc vs realloc - what is the best practice?

I have a variable where the size is determined in run-time.
Is it generally better to realloc it every time a new element is added like:
array_type * someArray;
int counter = 0;
//before a new element is added:
counter ++;
someArray = realloc(someArray, counter * sizeof(array_type));
Or allocate more memory than probably needed using malloc once:
array_type * someArray = malloc(ENOUG_MEMORY * sizeof(array_type));
What is best in terms of efficiency(speed), readability and memory-management? And why?

realloc could occasionally be useful in the past, with certain allocation patterns in single-threaded code. Most modern memory managers are optimized for multi-threaded programs and to minimize fragmentation, so, when growing an allocation, a realloc will almost certainly just allocate a new block, copy the existing data, and then free the old block. So there's no real advantage to trying to use realloc.
Increasing the size one element at a time can create an O(n^2) situation. In the worst case, the existing data has to be copied each time. It would be better to increase the size in chunks (which is still O(n^2) but with a smaller constant factor) or to grow the allocation geometrically (which gives an O(n) amortized cost).
Furthermore, it's difficult to use realloc correctly.
someArray = realloc(someArray, counter * sizeof(array_type));
If the realloc fails, someArray is set to NULL. If that was your only copy of the pointer to the previously allocated memory, you've just lost it.
You won't be able to access the data you had already placed, and you can't free the original allocation, so you'll have a memory leak.

What is best in terms of efficiency(speed), readability and memory-management? And why?
There is no general best. Specific best depends on your specific application and use case and environment. You can throw wars over perfect realloc ratio and decide if you need realloc at all.
Remember about rules of optimization. You do not optimize. Then, you do not optimize, without measuring first. The best in any terms can be measured for your specific setup, your specific environment, your specific application that uses specific operating system and *alloc implementation.
what is the best practice?
Allocating a constant amount of memory (if it's small enough) is static. It's just an array. Refactor you application to just:
array_type someArray[ENOUGH_MEMORY];
If you do not want to over allocate (or ENOUGH_MEMORY is big enough), then use realloc to add one element, as presented.
If you want, "optimize" by not calling realloc that often and over allocating - it seems that ratio 1.5 is the most preferred from the linked thread above. Still, it's highly application specific - I would over allocate on Linux, I would not when working on STM32 or other bare metal.

I would use realloc with caution.
Calling realloc in general leads to:
allocating completely new block
copying all data from old to new location
releasing (freeing) the initial block.
All combined could be questionable from performance perspective, depending on the app, volume of data, response requirements.
In addition, in case of realloc failure, the return value is NULL which means that allocation to new block is not straightforward (indirection is required). E.g.
int *p = malloc(100 * sizeof *p);
if (NULL == p)
{
perror("malloc() failed");
return EXIT_FAILURE;
}
do_something_with_p(p);
/* Reallocate array to a new size
* Using temporary pointer in case realloc() fails. */
{
int *temp = realloc(p, 100000 * sizeof *temp);
if (NULL == temp)
{
perror("realloc() failed");
free(p);
return EXIT_FAILURE;
}
p = temp;
}

malloc vs realloc - what is the best practice?
Helper functions
When writing robust code, I avoid using library *alloc() functions directly. Instead I form helper functions to handle various use cases and take care of edge cases, parameter validation, etc.
Within these helper functions, I use malloc(), realloc(), calloc() as building blocks, perhaps steered by implementation macros, to form good code per the use case.
This pushes the "what is best" to a narrower set of conditions where it can be better assessed - per function. In the growing by 2x case, realloc() is fine.
Example:
// Optimize for a growing allocation
// Return new pointer.
// Allocate per 2x *nmemb * size.
// Update *nmemb_new as needed.
// A return of NULL implies failure, old not deallocated.
void *my_alloc_grow(void *ptr, size_t *nmemb, size_t size) {
if (nmemb == NULL) {
return NULL;
}
size_t nmemb_old = *nmemb;
if (size == 0) { // Consider array elements of size 0 as error
return NULL;
}
if (nmemb_old > SIZE_MAX/2/size)) {
return NULL;
}
size_t nmemb_new = nmemb_old ? (nmemb_old * 2) : 1;
unsigned char *ptr_new = realloc(ptr, nmemb_new * size);
if (ptr_new == NULL) {
return NULL;
}
// Maybe zero fill new memory portion.
memset(ptr_new + nmemb_old * size, 0, (nmemb_new - nmemb_old) * size);
*nmemb = nmemb_new;
return ptr_new;
}
Other use cases.
/ General new memory
void *my_alloc(size_t *nmemb, size_t size); // General new memory
void *my_calloc(size_t *nmemb, size_t size); // General new memory with zeroing
// General reallocation, maybe more or less.
// Act like free() on nmemb_new == 0.
void *my_alloc_resize(void *ptr, size_t *nmemb, size_t nmemb_new, size_t size);

Is creating an array with a built-in lenght common in c?

For an experiment I created a function to initialize an array that have a built-in length like in java
int *create_arr(int len) {
void *ptr = malloc(sizeof(int[len + 1]));
int *arr = ptr + sizeof(int);
arr[-1] = len;
return arr;
}
that can be later be used like this
int *arr = create_arr(12);
and allow to find the length at arr[-1]. I was asking myself if this is a common practice or not, and if there is an error in what i did.

First of all, your code has some bugs, mainly that in standard C you can't do arithmetic on void pointers (as commented by MikeCAT). Probably a more typical way to write it would be:
int *create_arr(int len) {
int *ptr = malloc((len + 1) * sizeof(int));
if (ptr == NULL) {
// handle allocation failure
}
ptr[0] = len;
return ptr + 1;
}
This is legal but no, it's not common. It's more idiomatic to keep track of the length in a separate variable, not as part of the array itself. An exception is functions that try to reproduce the effect of malloc, where the caller will later pass back the pointer to the array but not the size.
One other issue with this approach is that it limits your array length to the maximum value of an int. On, let's say, a 64-bit system with 32-bit ints, you could conceivably want an array whose length did not fit in an int. Normally you'd use size_t for array lengths instead, but that won't work if you need to fit the length in an element of the array itself. (And of course this limitation would be much more severe if you wanted an array of short or char or bool :-) )
Note that, as Andrew Henle comments, the pointer returned by your function could be used for an array of int, but would not be safe to use for other arbitrary types as you have destroyed the alignment promised by malloc. So if you're trying to make a general wrapper or replacement for malloc, this doesn't do it.

Apart from the small mistakes that have already been pointed in comments, this is not common, because C programmers are used to handle arrays as an initial pointer and a size. I have mainly seen that in mixed programming environments, for example in Windows COM/DCOM where C++ programs can exchange data with VB programs.
Your array with builtin size is close to winAPI BSTR: an array of 16 bits wide chars where the allocated size is at index -1 (and is also a 16 bit integer). So there is nothing really bad with it.
But in the general case, you could have an alignment problem. malloc does return a pointer with a suitable alignment for any type. And you should make sure that the 0th index of your returned array also has a suitable alignment. If int has not the larger alignment, it could fail...
Furthermore, as the pointer is not a the beginning of the allocated memory, the array would require a special function for its deallocation. It should probaby be documented in a red flashing font, because this would be very uncommon for most C programmers.

This technique is not as uncommon as people expect. For example stb header only library for image processing uses this method to implement type safe vector like container in C. See https://github.com/nothings/stb/blob/master/stretchy_buffer.h

It would be more idiomatic to do something like:
struct array {
int *d;
size_t s;
};
struct array *
create_arr(size_t len)
{
struct array *a = malloc(sizeof *a);
if( a ){
a->d = malloc(len * sizeof *a->d);
a->s = a->d ? len : 0;
}
return a;
}

why use malloc function in c when we can declare arrays using arr[size] ,taking input from the user for size?

why use malloc function when we can write the code in c like this :
int size;
printf("please the size of the array\n");
scanf("%d",&size);
int arr[size];
this eliminates the possibility of assigning garbage value to array size and is also taking the size of the array at run time ...
so why use dynamic memory allocation at all when it can be done like this ?

This notation
int arr[size];
means VLA - Variable-Length Array.
Standard way they are implemented is that they are allocated on stack.
What is wrong with it?
Stack is usually relatively small - on my linux box it is only 8MB.
So if you try to run following code
#include <stdio.h>
const int MAX_BUF=10000000;
int main()
{
char buf[MAX_BUF];
int idx;
for( idx = 0 ; idx < MAX_BUF ; idx++ )
buf[idx]=10;
}
it will end up with seg fault.
TL;DR version
PRO:
VLA are OK for small allocations. You don't have to worry about freeing memory when leaving scope.
AGAINST:
They are unsafe for big allocations. You can't tell what is safe size to allocate (say recursion).

Besides the fact that VLA may encounter problems when their size is too large, there is a much more important thing with these: scope.
A VLA is allocated when the declaration is encountered and deallocated when the scope (the { ... }) is left. This has advantages (no function call needed for both operations) and disadvantages (you can't return it from a function or allocate several objects).
malloc allocates dynamically, so the memory chunk persists after return from the function you happen to be in, you can allocated with malloc several times (e.g in a for loop) and you determine exactly when you deallocate (by a call to free).

Why to not use the following:
int size;
printf("please the size of the array\n");
scanf("%d",&size);
int arr[size];
Insufficient memory. int arr[size]; may exceed resources and this goes undetected. #Weather Vane Code can detect failure with a NULL check using *alloc().
int *arr = malloc(sizeof *arr * size);
if (arr == NULL && size > 0) Handle_OutOfMemory();
int arr[size]; does not allow for an array size of 0. malloc(sizeof *arr * 0); is not a major problem. It may return NULL or a pointer on success, yet that can easily be handled.
Note: For array sizes, type size_t is best which is some unsigned integer type - neither too narrow, nor too wide. int arr[size]; is UB if size < 0. It is also a problem with malloc(sizeof *arr * size). An unqualified size is not a good idea with variable length array (VLA) nor *alloc().
VLAs, required since C99 are only optionally supported in a compliant C11 compiler.

What you write is indeed a possibility nowadays, but if you do that with g++ it will issue warnings (which is generally a bad thing).
Other thing is your arr[size] is stored at stack, while malloc stores data at heap giving you much more space.
With that is connected probably the main issue and that is, you can actually change size of your malloc'd arrays with realloc or free and another malloc. Your array is there for the whole stay and you cannot even free it at some point to save space.

how to calculate size of pointer pointed memory?

In one function I have written:
char *ab;
ab=malloc(10);
Then in another function I want to know the size of memory pointed by the ab pointer.
Is there any way that I can know that ab is pointing to 10 chars of memory?

No, you don't have a standard way to do this. You have to pass the size of the pointed-to memory along with the pointer, it's a common solution.
I.e. instead of
void f(char* x)
{
//...
}
use
void f(char *x, size_t length)
{
//....
}
and in your code
char *ab = malloc( 10 );
f( ab, 10 );

It's a deep secret that only free() knows for sure. It's likely in your system, but in a totally implementation dependent manner.
A bit awkward, but if you want to keep everything together:
typedef struct
{ // size of data followed by data (C only trick! NOT for C++)
int dimension; // number of data elements
int data[1]; // variable number of data elements
} malloc_int_t;
malloc_int_t *ab;
int dimension = 10;
ab = malloc( sizeof(*ab) + (dimension-1)*sizeof(int) );
ab->dimension = dimension;
ab->data[n] // data access
I've changed the data type to int to make the code a more generic template.

You can't (portably anyway). You have to keep track of the size yourself.
Some implementations of malloc could give you an API to access that information, but there is no provisions in the standard for this.

The size is what you passed into malloc, you can use a global variable or macro to remember it.

There is no way, you have to store the size of the allocated memory in another variable.

No, unfortunately.
You need to pass the size of the block along with the pointer.

No.
Now, that being said, there are non-portable hacks to do this, but it is not safe to rely upon them.
If you know with 100% certainty that the memory was allocated via malloc(), you may be able to rewind the pointer a few bytes and inspect the 'malloc node' that is used to track which parts of memory have been allocated and which have not. However, I can not stress this enough--do not ever depend upon this.

There is no way to deduce the size of allocated memory from the pointer itself. Since ab is a char *, sizeof(ab) is the same as sizeof(char *), which obviously is not the same as the size of the allocated chunk of memory.
Since you called malloc with the required size, you know what the size is. Pass this number along with the pointer to the function that needs to know the size.

I had a structure and a char pointer pointing to its memory address. So relating it to your question, I wanted to find the size of the memory location it was pointing to i.e. the size of the structure. So logically what you do is, find the size of the object the pointer creates to. This worked for me:
unsigned char * buffer= Library1Structure;
int x=sizeof(Library1Structure);
So the value of x tells me the size of the memory location the pointer buffer points to.

Code crashes unless I put a printf statement in it

This is a snippet of code from an array library I'm using. This runs fine on windows, but when I compile with gcc on linux if crashes in this function. when trying to narrow down the problem, I added a printf statement to it, and the code stopped crashing.
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
As soon as that printf is taken out it starts crashing on me again. I'm completely baffled as to why it's happening.

The last line in the function is not doing what was intended. The code is obscure to the point of impenetrability.
It appears that the goal is to allocate an array of int, because of the sizeof(int) in the first memory allocation. At the very least, if you are meant to be allocating an array of structure pointers, you need to use sizeof(SomeType *), the size of some pointer type (sizeof(void *) would do). As written, this will fail horribly in a 64-bit environment.
The array is allocated with a structure header (ArrayHeader) followed by the array proper. The returned value is supposed to the start of the array proper; the ArrayHeader will presumably be found by subtraction from the pointer. This is ugly as sin, and unmaintainable to boot. It can be made to work, but it requires extreme care, and (as Brian Kernighan said) "if you're as clever as possible when you write the code, how are you ever going to debug it?".
Unfortunately, the last line is wrong:
void _arrayCreateSize( void ***array, int capacity )
{
(*array) = malloc( (capacity * sizeof(int)) + sizeof(ArrayHeader) );
((ArrayHeader*)(*array))->size = 0;
((ArrayHeader*)(*array))->capacity = capacity;
// printf("Test!\n");
*(char**)array += sizeof(ArrayHeader);
}
It adds sizeof(ArrayHeader) * sizeof(char *) to the address, instead of the intended sizeof(ArrayHeader) * sizeof(char). The last line should read, therefore:
*(char *)array += sizeof(ArrayHeader);
or, as noted in the comments and an alternative answer:
*(ArrayHeader *)array += 1;
*(ArrayHeader *)array++;
I note in passing that the function name should not really start with an underscore. External names starting with an underscore are reserved to the implementation (of the C compiler and library).
The question asks "why does the printf() statement 'fix' things". The answer is because it moves the problem around. You've got a Heisenbug because there is abuse of the allocated memory, and the presence of the printf() manages to alter the behaviour of the code slightly.
Recommendation
Run the program under valgrind. If you don't have it, get it.
Revise the code so that the function checks the return value from malloc(), and so it returns a pointer to a structure for the allocated array.
Use the clearer code outlined in Michael Burr's answer.

Arbitrary random crashing when adding seemingly unrelated printf() statements often is a sign of a corrupted heap. The compiler sometimes stores information about allocated memory directly on the heap itself. Overwriting that metadata leads to surprising runtime behavior.
A few suggestions:
are you sure that you need void ***?
try replacing your argument to malloc() with 10000. Does it work now?
Moreover, if you just want arrays that store some metadata, your current code is a bad approach. A clean solution would probably use a structure like the following:
struct Array {
size_t nmemb; // size of an array element
size_t size; // current size of array
size_t capacity; // maximum size of array
void *data; // the array itself
};
Now you can pass an object of type Array to functions that know about the Array type, and Array->data cast to the proper type to everything else. The memory layout might even be the same as in your current approach, but access to the metadata is significantly easier and especially more obvious.
Your main audience is the poor guy that has to maintain your code 5 years from now.

Now that Jonathan Leffler has pointed out what the bug was, might I suggest that the function be written in a manner that's a little less puzzling?:
void _arrayCreateSize( void ***array, int capacity )
{
// aloocate a header followed by an appropriately sized array of pointers
ArrayHeader* p = malloc( sizeof(ArrayHeader) + (capacity * sizeof(void*)));
p->size = 0;
p->capacity = capacity;
*array = (void**)(p+1); // return a pointer to just past the header
// (pointing at the array of pointers)
}
Mix in your own desired handling of malloc() failure.
I think this will probably help the next person who needs to look at it.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Array of strings memory reallocation in C - arrays

Related

malloc vs realloc - what is the best practice?

Is creating an array with a built-in lenght common in c?

why use malloc function in c when we can declare arrays using arr[size] ,taking input from the user for size?

how to calculate size of pointer pointed memory?

Code crashes unless I put a printf statement in it

Categories

Resources