I am trying to write a set of functions that will support a dynamically allocated array where a struct contains the array and other metadata. The goal is to return the function to the user, and the struct information can be called from a function. The code seems to work just fine until I get to the function to free the memory from heap. For reasons I do not understand, the code fails with a segmentation fault, which would indicate that the variable vec in the free_vector function is not pointing to the correct address. However, I have verified with print statements that it is pointing to the correct address. I am hoping someone can help me understand why the free_vector function is not working, specifically the free command. My code and implementation is shown below.
typedef struct
{
size_t allocated_length;
size_t active_length;
size_t num_bytes;
char *vector;
} Vector;
void *init_vector(size_t num_indices, size_t num_bytes) {
// Allocate memory for Vector struct
Vector *vec = malloc(sizeof(*vec));
vec->active_length = 0;
vec->num_bytes = num_bytes;
// Allocate heap memory for vector
void *ptr = malloc(num_bytes * num_indices);
if (ptr == NULL) {
printf("WARNING: Unable to allocate memory, exiting!\n");
return &vec->vector;
}
vec->allocated_length = num_indices;
vec->vector = ptr;
return &vec->vector;
}
// --------------------------------------------------------------------------------
int push_vector(void *vec, void *elements, size_t num_indices) {
Vector *a = get_vector_data(vec);
if(a->active_length + num_indices > a->allocated_length) {
printf("TRUE\n");
size_t size = (a->allocated_length + num_indices) * 2;
void *ptr = realloc(a->vector, size * a->num_bytes);
if (ptr == NULL) {
printf("WARNING: Unable to allocate memory, exiting!\n");
return 0;
}
a->vector = ptr;
a->allocated_length = size;
}
memcpy((char *)vec + a->active_length * a->num_bytes, elements,
num_indices * a->num_bytes);
a->active_length += num_indices;
return 1;
}
// --------------------------------------------------------------------------------
Vector *get_vector_data(void *vec) {
// - The Vector struct has three size_t variables that proceed the vector
// variable. These variables consume 24 bytes of daya. THe code below
// points backwards in memory by 24 bytes to the beginning of the Struct.
char *a = (char *)vec - 24;
return (Vector *)a;
}
// --------------------------------------------------------------------------------
void free_vector(void *vec) {
// Free all Vector struct elements
Vector *a = get_vector_data(vec);
// - This print statement shows that the variable is pointing to the
// correct data.
printf("%d\n" ((int *)vec)[2]);
// The function fails on the next line and I do not know why
free(a->vector);
a->vector = NULL;
a->allocated_length = 0;
a->active_length = 0;
a->num_bytes = 0;
}
int main() {
int *a = init_vector(3, sizeof(int));
int b[3] = {1, 2, 3};
push_vector(a, b, 3);
// The code begins to fails here
free_vector(a);
}
This program suffers from Undefined Behaviour.
The return value from init_vector is of type char **, a pointer-to-pointer-to-char,
return &vec->vector;
converted to void *.
In main, this value is converted to an int *
int *a = init_vector(3, sizeof(int));
This value is then converted back into a void * when passed to push_vector.
In push_vector, this value is cast to a char * in order to perform pointer arithmetic
memcpy((char *)vec + a->active_length * a->num_bytes, elements,
num_indices * a->num_bytes);
where this operation overwrites the original pointer returned by malloc contained in the vector member.
On my system, this attempts to write 12 bytes (three int) to memory starting with the position of the vector member in the Vector structure.
Vector *vec
| &vec->vector
| |
v v
+------+------+------+------+-----+
|size_t|size_t|size_t|char *|?????|
+------+------+------+------+-----+
This overflows, as sizeof (char *) is 8 on my system.
This is the wrong place to write data. The correct place to write data is *(char **) vec - or just a->vector.
If the write does not crash the program directly (UB), this surely results in free being passed a pointer value that was not returned by malloc, calloc, or realloc, or the pointer value NULL.
Aside: In free_vector, this value is also cast to an int *
printf("%d\n", ((int *)vec)[2]); /* added a missing semi-colon. */
Additionally, it is unclear if free_vector should free the original allocation, or just the vector member. You do go to lengths to zero-out the structure here.
Still, as is, you have a memory leak - albeit a small one.
void free_vector(void *vec) {
Vector *a = get_vector_data(vec);
/* ... */
free(a); /* This has to happen at some point. */
}
Note, you should be using offsetof to calculate the position of members within a structure. A static offset of 24 assumes two thing that may not hold true:
sizeof (size_t) is always 8 (actual minimum sizeof (size_t) is 2), and
the structure contains no padding to satisfy alignment (this seems likely given the form, but not strictly true).
The source you linked in the comments uses a flexible array member, not a pointer member, meaning the entirety of the data (allocation sizes and the vector) is stored in contiguous memory. That is why the & operator yields a valid location to copy data to in this implementation.
(Aside: the linked implementation appears to be broken by effectively using sizeof to get the base of the container structure from a pointer to the flexible array member (e.g., &((vector_container *) pointer_to_flexible_member)[-1]), which does not take into account the possibility of trailing padding, which would result in a larger offset than expected.)
Related
I am working on a series of C functions to allow a user to dynamically build an array. The core of the library resides in the Array struct which contains a pointer variable array that contains the array data, len which contains the length of the array, size, which is the total memory allocation for the array, elem, which contains the memory allocation per indices, and pointer variables name and dtype which contains strings describing the name of the array and the type of the array. For the moment I have constrained the scope so that only int, float, double, and char arrays can be considered.
Thus far I have defined, and individually tested the following functions;
array_mem_alloc which contains code that allocates memory for an array.
init_array which is a wrapper around array_mem_alloc that instantiates an Array struct, determines the data type and returns an Array data type to a user.
append_array which allows a user to dynamically grow an array one index at a time, or add an already defined array.
free_array which frees all memory and resets struct variables
int_array_val which typecasts the data at an index and returns to user. I have versions of this function for all relevant data types, but for this problem I will only use this version.
find_int_array_indices which looks for where a specific integer exists in the array and records the index number into another array which is returned to the user.
For the purposes of testing find_int_array_indices I am calling init_array for a variable titled arr_test and appending it with 7 integers int a[7] = {6, 1, 3, 6, 6, 4, 5}. I pass the Array container arr_test to the find_int_array_indices function and everything works fine, which also returns another Array container titled p. However, when I try to retrieve the integer variables with the int_array_val function it fails, because it does not recognize the variable array->dtype as containing the string "int". However, when I test the container inside of find_int_array_indices and in the main function, the variable does contain the string "int". This tells me that I probably have a pointer error, but I do not see it. Any advice would be very useful. I am wondering if I need to go back to the beginning and define name and dtype as fixed length arrays in the Array struct instead of as pointer variables.
array.h
typedef struct
{
void *array; // Pointer to array
size_t len; // Active length of array
size_t size; // Number of allocated indizes
int elem; // Memory consumption per indice
char *name; // The array name
char *dtype; // A string representing the datatype
} Array;
void array_mem_alloc(Array *array, size_t num_indices);
Array init_array(char *dtype, size_t num_indices, char *name);
int append_array(Array *array, void *elements, size_t count);
void free_array(Array *array);
int int_array_val(Array *array, int indice);
Array find_int_array_indices(Array *array, int integer);
array.c
void array_mem_alloc(Array *array, size_t num_indices) {
// Determine the total memory allocation and assign to pointer
void *pointer;
pointer = malloc(num_indices * array->elem);
// If memory is full fail gracefully
if (pointer == NULL) {
printf("Unable to allocate memory, exiting.\n");
free(pointer);
exit(0);
}
// Allocate resources and instantiate Array
else {
array->array = pointer;
array->len = 0;
array->size = num_indices;
}
}
// --------------------------------------------------------------------------------
Array init_array(char *dtype, size_t num_indices, char *name) {
// Determine memory blocks based on data type
int size;
if (strcmp(dtype, "float") == 0) size = sizeof(float);
else if (strcmp(dtype, "int") == 0) size = sizeof(int);
else if (strcmp(dtype, "double") == 0) size = sizeof(double);
else if (strcmp(dtype, "char") == 0) size = sizeof(char);
else {
printf("Data type not correctly entered into init_array, exiting program!\n");
exit(0);
}
// Allocate indice size and call array_mem_alloc
Array array;
array.dtype = dtype;
array.elem = size;
array_mem_alloc(&array, num_indices);
array.name = name;
return array;
}
// --------------------------------------------------------------------------------
int append_array(Array *array, void *elements, size_t count) {
// Allocae more memory if necessary
if (array->len + count > array->size) {
size_t size = (array->len + count) * 2;
void *pointer = realloc(array->array, size * array->elem);
// If memory is full return operations
if (pointer == NULL) {
printf("Unable to allocate memory, exiting.\n");
return 0;
}
// Allocate memory to variables and increment array size
array->array = pointer;
array->size = size;
}
// Append variables and increment the array length
memcpy((char *)array->array + array->len * array->elem, elements, count * array->elem);
array->len += count;
return 1;
}
// --------------------------------------------------------------------------------
void free_array(Array *array) {
// Free all memory in the array
free(array->array);
// Reset all variables in the struct
array->array = NULL;
array->size = 0;
array->len = 0;
array->elem = 0;
}
// --------------------------------------------------------------------------------
int int_array_val(Array *array, int indice) {
// Ensure array contains integers
printf("%s\n", array->dtype);
if (strcmp(array->dtype, "int") != 0) {
printf("Function can only return integer values, exiting function!\n");
exit(0);
}
// Cast value to an integer and return
int a = ((int *)array->array)[indice];
return a;
}
Array find_int_array_indices(Array *array, int integer) {
int number = 0;
int input;
for (int i = 0; i < array->len; i++) {
if (integer == int_array_val(array, i)) {
number++;
}
}
char dtype[7] = "int";
char name[9] = "indices";
Array indice_arr = init_array(dtype, number, name);
for (int i = 0; i < array->len; i++) {
input = i;
if (integer == int_array_val(array, i)) {
append_array(&indice_arr, &input, 1);
}
}
return indice_arr;
}
main.c
size_t indices = 10;
char name[6] = "array";
char dtype[7] = "int";
Array arr_test = init_array(dtype, indices, name);
int a[7] = {6, 1, 3, 6, 6, 4, 5};
append_array(&arr_test, a, 7);
Array p = find_int_array_indices(&arr_test, 6);
printf("%s\n", p.dtype); // This shows that p does contain dtype "int"
int d = int_array_val(&p, 0); // This fails in function, because it does not see dtype = "int"???
printf("%d\n", d);
In find_int_array_indices
char dtype[7] = "int";
char name[9] = "indices";
are both local variables, which are invalidated when the function returns. See: Dangling pointer and Lifetime.
init_array uses these values as if they had a lifetime to match its return value
Array array;
array.dtype = dtype;
array.elem = size;
array_mem_alloc(&array, num_indices);
array.name = name;
return array;
which, as a structure type, is a lifetime determined by the context of its caller (return is copy, after all).
find_int_array_indices completes the error when it returns indice_arr to main.
Some options:
Strictly use pointers to strings with static storage duration.
Change your structure definition to include space for these strings (or allocate it), and perform string copies.
Use an enumerated type instead.
Ditch this string-based, type limited paradigm all together by supporting all memory sizes generically (the naming feature remains an issue, though).
A rather long-winded continuation, to elaborate on using enumerated types:
The idea is to define a smaller set of acceptable values that your library works with, and making the user more aware of these values. As we can see, you have partially done that using strings but the implementation has some issues, as strings are generally clunky. Some problems with strings:
you have no control over the strings that users of your library use (this leads you to have to exit1 the program in the event the users enters something unexpected, which is easy to do),
you must account for their potentially large or excess memory consumption,
string comparison is O(N),
strings are generally unsafe in C, requiring more care than other basic constructs when handling them (assignment, comparison, storage).
So instead of using strings ("foo", "bar", "qux" in these examples), we use an enumerated type
enum OBJECT_TYPE {
OBJECT_FOO,
OBJECT_BAR,
OBJECT_QUX
};
which establishes the following:
it is more clear what the acceptable values are
some2 control over what users enter, via type hinting
comparison is O(1)
handling is the same as any integral type
The structure definition then looks like
typedef struct {
/* ... whatever members are needed for the structure */
size_t something_based_on_type;
enum OBJECT_TYPE type;
char debug_name[MAX_DEBUG_NAME];
} Object;
Nothing can really be done about the name member of your structure. If you want user defined nametags for things, then yes, as stated previously, you need to allocate space for them.
Our initialization function works similarly, but we can2 take advantage of some properties of integral types.
void object_init(Object *object, enum OBJECT_TYPE type, const char *debug_name) {
/* ... accept other arguments, whatever is needed to initialize */
size_t value_translations[] = { 42, 51, 99 };
object->type = type;
/* while neat, this is somewhat naive, see footnotes */
object->something_based_on_type = value_translations[type];
if (debug_name && strlen(debug_name) < MAX_DEBUG_NAME)
strcpy(object->debug_name, debug_name);
else
*object->debug_name = '\0';
}
Now we want to provide a function that works with our generic data of only type OBJECT_FOO (like your int_array_val). Again, the comparison is much easier to understand.
void object_print_foo(Object *o) {
if (OBJECT_FOO != o->type)
/* handle type mismatch */;
}
Although it would be better to provide a generic object_print function that again branches based on o->type.
A main function for completeness:
int main(void) {
Object a;
object_init(&a, OBJECT_QUX, "object_a");
object_print_foo(&a);
}
This is the general idea of using enumerated types.
With all that said, I think this is not really any better than just handling arbitrary data sizes, risks included. Something like
const void *array_get(Array *array, size_t index) {
if (index >= array->length)
return NULL;
return (char *) array->array + index * array->elem;
}
works, if the user respects the const contract, and uses the correct types (they would need to remember their typing with specifically typed getters too).
Generic data structures in C are a bit of a leap of faith no matter what.
1. So a note on exiting from library code: don't. As a library author, you have no reasonable right to cause user programs to terminate (unless requested, or the user invokes UB outside your control). Delegate upwards, return errors, and let the user exit the program on their own terms, as they may need to perform their own cleanups (or might carry on if the failure is non-critical).
2. C's enumeration type is rather weak. enum are actually just int, and users can enter plain integer values outside the specified ranges. This is akin to invoking undefined behavior from a library's point of view, but we may wish to protect the user anyway.
My current concat function:
char* concat(char* a, int a_size,
char* b, int b_size) {
char* c = malloc(a_size + b_size);
memcpy(c, a, a_size);
memcpy(c + a_size, b, b_size);
free(a);
free(b);
return c;
}
But this used extra memory. Is it possible to append two byte arrays using realloc without making extra memory space?
Like:
void append(char* a, int a_size, char* b, int b_size)
...
char* a = malloc(2);
char* b = malloc(2);
void append(a, 2, b, 2);
//The size of a will be 4.
While Jean-François Fabre answered the stated question, I'd like to point out that you can manage such byte arrays better by using a structure:
typedef struct {
size_t max; /* Number of chars allocated for */
size_t len; /* Number of chars in use */
unsigned char *data;
} bytearray;
#define BYTEARRAY_INIT { 0, 0, NULL }
void bytearray_init(bytearray *barray)
{
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
void bytearray_free(bytearray *barray)
{
free(barray->data);
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
To declare an empty byte array, you can use either bytearray myba = BYTEARRAY_INIT; or bytearray myba; bytearray_init(&myba);. The two are equivalent.
When you no longer need the array, call bytearray_free(&myba);. Note that free(NULL) is safe and does nothing, so it is perfectly safe to free a bytearray that you have initialized, but not used.
To append to a bytearray:
int bytearray_append(bytearray *barray, const void *from, const size_t size)
{
if (barray->len + size > barray->max) {
const size_t len = barray->len + size;
size_t max;
void *data;
/* Example policy: */
if (len < 8)
max = 8; /* At least 8 chars, */
else
if (len < 4194304)
max = (3*len) / 2; /* grow by 50% up to 4,194,304 bytes, */
else
max = (len | 2097151) + 2097153 - 24; /* then pad to next multiple of 2,097,152 sans 24 bytes. */
data = realloc(barray->data, max);
if (!data) {
/* Not enough memory available. Old data is still valid. */
return -1;
}
barray->max = max;
barray->data = data;
}
/* Copy appended data; we know there is room now. */
memmove(barray->data + barray->len, from, size);
barray->len += size;
return 0;
}
Since this function can at least theoretically fail to reallocate memory, it will return 0 if successful, and nonzero if it cannot reallocate enough memory.
There is no need for a malloc() call, because realloc(NULL, size) is exactly equivalent to malloc(size).
The "growth policy" is a very debatable issue. You can just make max = barray->len + size, and be done with it. However, dynamic memory management functions are relatively slow, so in practice, we don't want to call realloc() for every small little addition.
The above policy tries to do something better, but not too aggressive: it always allocates at least 8 characters, even if less is needed. Up to 4,194,304 characters, it allocates 50% extra. Above that, it rounds the allocation size to the next multiple of 2,097,152 and substracts 24. The reasoning behid this is complex, but it is more for illustration and understanding than anything else; it is definitely NOT "this is best, and this is what you should do too". This policy ensures that each byte array allocates at most 4,194,304 = 222 unused characters. However, 2,097,152 = 221 is the size of a huge page on AMD64 (x86-64), and is a power-of-two multiple of a native page size on basically all architectures. It is also large enough to switch from so-called sbrk() allocation to memory mapping on basically all architectures that do that. It means that such huge allocations use a separate part of the heap for each, and the unused part is usually just virtual memory, not necessarily backed by any RAM, until accessed. As a result, this policy tends to work quite well for both very short byte arrays, and very long byte arrays, on most architectures.
Of course, if you know (or measure!) the typical size of the byte arrays in typical workloads, you can optimize the growth policy for that, and get even better results.
Finally, it uses memmove() instead of memcpy(), just in case someone wishes to repeat a part of the same byte array: memcpy() only works if the source and target areas do not overlap; memmove() works even in that case.
When using more advanced data structures, like hash tables, a variant of the above structure is often useful. (That is, this is much better in cases where you have lots of empty byte arrays.)
Instead of having a pointer to the data, the data is part of the structure itself, as a C99 flexible array member:
typedef struct {
size_t max;
size_t len;
unsigned char data[];
} bytearray;
You cannot declare a byte array itself (i.e. bytearray myba; will not work); you always declare a pointer to a such byte arrays: bytearray *myba = NULL;. The pointer being NULL is just treated the same as an empty byte array.
In particular, to see how many data items such an array has, you use an accessor function (also defined in the same header file as the data structure), rather than myba.len:
static inline size_t bytearray_len(bytearray *const barray)
{
return (barray) ? barray->len : 0;
}
static inline size_t bytearray_max(bytearray *const barray)
{
return (barray) ? barray->max : 0;
}
The (expression) ? (if-true) : (if-false) is a ternary operator. In this case, the first function is exactly equivalent to
static inline size_t bytearray_len(bytearray *const barray)
{
if (barray)
return barray->len;
else
return 0;
}
If you wonder about the bytearray *const barray, remember that pointer declarations are read from right to left, with * as "a pointer to". So, it just means that barray is constant, a pointer to a byte array. That is, we may change the data it points to, but we won't change the pointer itself. Compilers can usually detect such stuff themselves, but it may help; the main point is however to remind us human programmers that the pointer itself is not to be changed. (Such changes would only be visible within the function itself.)
Since such arrays often need to be resized, the resizing is often put into a separate helper function:
bytearray *bytearray_resize(bytearray *const barray, const size_t len)
{
bytearray *temp;
if (!len) {
free(barray);
errno = 0;
return NULL;
}
if (!barray) {
temp = malloc(sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
errno = ENOMEM;
return NULL;
}
temp->max = len;
temp->len = 0;
return temp;
}
if (barray->len > len)
barray->len = len;
if (barray->max == len)
return barray;
temp = realloc(barray, sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
free(barray);
errno = ENOMEM;
return NULL;
}
temp->max = len;
return temp;
}
What does that errno = 0 do in there? The idea is that because resizing/reallocating a byte array may change the pointer, we return the new one. If the allocation fails, we return NULL with errno == ENOMEM, just like malloc()/realloc() do. However, since the desired new length was zero, this saves memory by freeing the old byte array if any, and returns NULL. But since that is not an error, we set errno to zero, so that it is easier for callers to check if an error occurred or not. (If the function returns NULL, check errno. If errno is nonzero, an error occurred; you can use strerror(errno) to get a descriptive error message.)
You probably also noted the sizeof barray->data[0], used even when barray is NULL. This is okay, because sizeof is not a function, but an operator: it does not access the right side at all, it only evaluates to the size of the thing the right side refers to. (You only need to use parentheses when the right size is a type.) This form is nice, because it lets a programmer change the type of the data member, without changing any other code.
To append data to such a byte array, we probably want to be able to specify whether we anticipate further appends to the same array, or whether this is probably the final append, so that only the exact needed amount of memory is needed. For simplicity, I'll only implement the exact size version here. Note that this function returns a pointer to the (modified) byte array:
bytearray *bytearray_append(bytearray *barray,
const void *from, const size_t size,
int exact)
{
size_t len = bytearray_len(barray) + size;
if (exact) {
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by bytearray_resize(). */
} else
if (bytearray_max(barray) < len) {
if (!exact) {
/* Apply growth policy */
if (len < 8)
len = 8;
else
if (len < 4194304)
len = (3 * len) / 2;
else
len = (len | 2097151) + 2097153 - 24;
}
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by the bytearray_resize() call */
}
if (size) {
memmove(barray->data + barray->len, from, size);
barray->len += size;
}
return barray;
}
This time, we declared bytearray *barray, because we change where barray points to in the function. If the fourth parameter, final, is nonzero, then the resulting byte array is exactly the size needed; otherwise the growth policy is applied.
yes, since realloc will preserve the start of your buffer if the new size is bigger:
char* concat(char* a, size_t a_size,
char* b, size_t b_size) {
char* c = realloc(a, a_size + b_size);
memcpy(c + a_size, b, b_size); // dest is after "a" data, source is b with b_size
free(b);
return c;
}
c may be different from a (if the original memory block cannot be resized in-place contiguously to the new size by the system) but if that's the case, the location pointed by a will be freed (you must not free it), and the original data will be "moved".
My advice is to warn the users of your function that the input buffers must be allocated using malloc, else it will crash badly.
I'm trying to write a function that uses realloc() to extend the array as pointed to within in instance of a struct, however I can't seem to get it to work.
The relevant part of my code is:
struct data_t {
int data_size;
uint16_t *data;
};
void extend_data(data_t container, uint16_t value) {
// adds an additional uint16_t to the array of DATA, updates its internal
// variables, and initialises the new uint to VALUE.
int len_data = sizeof(*(container->data)) / sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
container->data = realloc(container->data, sizeof(*(container->data))+sizeof(uint16_t));
container->data_size++;
container->data[container->data_size-1] = value;
len_data = sizeof(*(container->data)) / sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
printf("data_size: %d\n", container->data_size);
return;
}
Can anybody see what the problem is with this?
Edit
As R. Sahu points out, container is not a pointer in this function - when you said the code "wasn't working", I assumed you meant that you weren't growing your array, but what you've written here won't even compile.
Are you sure you've copied this code correctly? If so, does "not working" mean you're getting a compile-time error, a run-time error, or just unexpected output?
If you've copied the code as written, then the first thing you need to do is change the function prototype to
void extend_data(data_t *container, uint16_t value) {
and make sure you're passing a pointer to your data_t type, otherwise the update won't be reflected in calling code.
Original
In the line
container->data = realloc(container->data, sizeof(*(container->data))+sizeof(uint16_t));
sizeof(*(container->data)) evaluates to sizeof (uint16_t). container->data is a pointer to, not an array of, uint16_t; sizeof will give you the size of the pointer object, not the number of elements you've allocated. What you want to do is something like the following:
/**
* Don't assign the result of a realloc call back to the original
* pointer - if the call fails, realloc will return NULL and you'll
* lose the reference to your original buffer. Assign the result to
* a temporary, then after making sure the temporary is not NULL,
* assign that back to your original pointer.
*/
uint16_t *tmp = realloc(container-data, sizeof *container->data * (container->data_size + 1) );
if ( tmp )
{
/**
* Only add to container->data and update the value of container->data_size
* if the realloc call succeeded.
*/
container->data = tmp;
container->data[container->data_size++] = value;
}
You don't calculate the new size correctly. Consider this:
typedef struct {
size_t size;
int *data;
} int_array;
#define INT_ARRAY_INIT { 0, NULL}
void int_array_resize(int_array *const array,
const size_t newsize)
{
if (!array) {
fprintf(stderr, "int_array_resize(): NULL int_array.\n");
exit(EXIT_FAILURE);
}
if (!newsize) {
free(array->data);
array->data = 0;
array->size = 0;
} else
if (newsize != array->size) {
void *temp;
temp = realloc(array->data, newsize * sizeof array->data[0]);
if (!temp) {
fprintf(stderr, "int_array_resize(): Out of memory.\n");
exit(EXIT_FAILURE);
}
array->data = temp;
array->size = newsize;
}
}
/* int_array my_array = INT_ARRAY_INIT;
is equivalent to
int_array my_array;
int_array_init(&my_array);
*/
void int_array_init(int_array *const array)
{
if (array) {
array->size = 0;
array->data = NULL;
}
}
void int_array_free(int_array *const array)
{
if (array) {
free(array->data);
array->size = 0;
array->data = NULL;
}
}
The key point is newsize * sizeof array->data[0]. This is the number of chars needed for newsize elements of whatever type array->data[0] has. Both malloc() and realloc() take the size in chars.
If you initialize new structures of that type using int_array my_array = INT_ARRAY_INIT; you can just call int_array_resize() to resize it. (realloc(NULL, size) is equivalent to malloc(size); free(NULL) is safe and does nothing.)
The int_array_init() and int_array_free() are just helper functions to initialize and free such arrays.
Personally, whenever I have dynamically resized arrays, I keep both the allocated size (size) and the size used (used):
typedef struct {
size_t size; /* Number of elements allocated for */
size_t used; /* Number of elements used */
int *data;
} int_array;
#define INT_ARRAY_INIT { 0, 0, NULL }
A function that ensures there are at least need elements that can be added is then particularly useful. To avoid unnecessary reallocations, the function implements a policy that calculates the new size to allocate for, as a balance between amount of memory "wasted" (allocated but not used) and number of potentially slow realloc() calls:
void int_array_need(int_array *const array,
const size_t need)
{
size_t size;
void *data;
if (!array) {
fprintf(stderr, "int_array_need(): NULL int_array.\n");
exit(EXIT_FAILURE);
}
/* Large enough already? */
if (array->size >= array->used + need)
return;
/* Start with the minimum size. */
size = array->used + need;
/* Apply growth/reallocation policy. This is mine. */
if (size < 256)
size = (size | 15) + 1;
else
if (size < 2097152)
size = (3 * size) / 2;
else
size = (size | 1048575) + 1048577 - 8;
/* TODO: Verify (size * sizeof array->data[0]) does not overflow. */
data = realloc(array->data, size * sizeof array->data[0]);
if (!data) {
/* Fallback: Try minimum allocation. */
size = array->used + need;
data = realloc(array->data, size * sizeof array->data[0]);
}
if (!data) {
fprintf(stderr, "int_array_need(): Out of memory.\n");
exit(EXIT_FAILURE);
}
array->data = data;
array->size = size;
}
There are many opinions on what kind of reallocation policy you should use, but it really depends on the use case.
There are three things in the balance: number of realloc() calls, as they might be "slow"; memory fragmentation if different arrays are grown requiring many realloc() calls; and amount of memory allocated but not used.
My policy above tries to do many things at once. For small allocations (up to 256 elements), it rounds the size up to the next multiple of 16. That is my attempt at a good balance between memory used for small arrays, and not very many realloc() calls.
For larger allocations, 50% is added to the size. This reduces the number of realloc() calls, while keeping the allocated but unused/unneeded memory below 50%.
For really large allocations, when you have 221 elements or more, the size is rounded up to the next multiple of 220, less a few elements. This caps the number of allocated but unused elements to about 221, or two million elements.
(Why less a few elements? Because it does not harm on any systems, and on certain systems it may help a lot. Some systems, including x86-64 (64-bit Intel/AMD) on certain operating systems and configurations, support large ("huge") pages that can be more efficient in some ways than normal pages. If they are used to satisfy an allocation, I want to avoid the case where an extra large page is allocated just to cater for the few bytes the C library needs internally for the allocation metadata.)
It appears you aren't using sizeof correctly. In your struct you've defined a uint16_t pointer, not an array. The size of the uint16_t* data type is the size of a pointer on your system. You need to store the size of the allocated memory along with the pointer if you want to be able to accurately resize it. It appears you already have a field for this with data_size. Your example might be able to be fixed as,
// I was unsure of the typedef-ing happening with data_t so I made it more explicit in this example
typedef struct {
int data_size;
uint16_t* data;
} data_t;
void extend_data(data_t* container, uint16_t value) {
// adds an additional uint16_t to the array of DATA, updates its internal
// variables, and initialises the new uint to VALUE.
// CURRENT LENGTH OF DATA
int len_data = container->data_size * sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
uint16_t* tmp = realloc(container->data, (container->data_size + 1) * sizeof(uint16_t));
if (tmp) {
// realloc could fail and return false.
// If this is not handled it could overwrite the pointer in `container` and cause a memory leak
container->data = tmp;
container->data_size++;
container->data[container->data_size-1] = value;
} else {
// Handle allocation failure
}
len_data = container->data_size * sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
printf("data_size: %d\n", container->data_size);
return;
}
void extend_data(data_t container, ...
In your function container is not the pointer but the struct itself passed by the value so you cant use the -> operator.
The realloced memory will be lost as you work on the local copy of the passed strucure and it will be lost on the function return.
sizeof(*(container.data)) / sizeof(uint16_t)
it will be always 1 as the *(uint16_t *) / sizeof(uint16_t) is always one.
Why: data member is pointer to the uint16_t. *data has the type of uint16_t
sizeof is calculated during the compilation not the runtime and it does not return the ammount of memory allocated by the malloc.
I'm a beginner in C and I'm facing this problem: I created a function based on the fast matrix allocation method (Oliveira and Stewart, "Writing Scientific Software", pag. 94) and I want to use it for any data type.
I therefore changed it a bit as follows:
void ** malloc_array2d(size_t m, size_t n){
/* pointer to array of pointers */
void ** pointer;
size_t i;
/* allocate pointer array of length m */
pointer = malloc(m*sizeof(void));
if(pointer == NULL){
return NULL;
}
/* allocate storage for m*n entries */
pointer[0] = malloc(m*n*sizeof(void));
if (pointer[0] == NULL) {
free(pointer);
return NULL;
}
/* set the pointers */
for (i = 1; i < m; i++) {
pointer[i] = pointer[0] + i*n;
}
return pointer;
}
but I get segmentation fault.
The question is: how to allow for memory allocation of different data type, since sizeof(void) is not working (and indeed it returns just 1)?
Any feedback is really appreciated.
Thanks.
void is not the matching type of what pointer references. pointer references void *, not void.
Avoid the mistake in the future by not coding the size of the referenced type, but coding the size of the de-referenced pointer.
// pointer = malloc(m*sizeof(void));
pointer = malloc(sizeof *pointer * m);
For the next allocation, sizeof(void) * m *n is not well defined. Code needs a new approach.
// pointer[0] = malloc(m*n*sizeof(void));
To allocate for various types, pass in the size of the data type.
void ** malloc_array2d(size_t m, size_t n, size_t data_size){
...
unsigned char *p = malloc(data_size * m *n);
...
for (i = 0; i < m; i++) {
pointer[i] = p + i*n*data_size;
}
Sizeof returns the quantity of bytes that every datatype are. 1 for byte, 2 for int16, 4 for int32, etc... You can then pass it as parameter, with any kind of problem, as at the moment of use of malloc_2darray function you should know final datatype to map to.
Note that always you use your malloc_2darray function you should cast to final datatype pointer for a correct interpretation of returned pointers.
Firstly the value of the sizeof(void) is always 1, here void refers to pointer memory allocation for untyped datatype. I don't think any other datatype takes that much less memory. Well int, float, etc consumes more bit of data. If you want the value of sizeof() to return 1, You can just manually specify the size in malloc() function instead of using sizeof() functionalities along with different datatypes.
Using what I have learned here: How to use realloc in a function in C, I wrote this program.
int data_length; // Keeps track of length of the dynamic array.
int n; // Keeps track of the number of elements in dynamic array.
void add(int x, int data[], int** test)
{
n++;
if (n > data_length)
{
data_length++;
*test = realloc(*test, data_length * sizeof (int));
}
data[n-1] = x;
}
int main(void)
{
int *data = malloc(2 * sizeof *data);
data_length = 2; // Set the initial values.
n = 0;
add(0,data,&data);
add(1,data,&data);
add(2,data,&data);
return 0;
}
The goal of the program is to have a dynamic array data that I can keep adding values to. When I try to add a value to data, if it is full, the length of the array is increased by using realloc.
Question
This program compiles and does not crash when run. However, printing out data[0],data[1],data[2] gives 0,1,0. The number 2 was not added to the array.
Is this due to my wrong use of realloc?
Additional Info
This program will be used later on with a varying number of "add" and possibly a "remove" function. Also, I know realloc should be checked to see if it failed (is NULL) but that has been left out here for simplicity.
I am still learning and experimenting with C. Thanks for your patience.
Your problem is in your utilisation of data, because it points on the old array's address. Then, when your call realloc, this area is freed. So you are trying to access to an invalid address on the next instruction: this leads to an undefined behavior.
Also you don't need to use this data pointer. test is sufficient.
(*test)[n-1] = x;
You don't need to pass data twice to add.
You could code
void add(int x, int** ptr)
{
n++;
int *data = *ptr;
if (n > data_length) {
data_length++;
*ptr = data = realloc(oldata, data_length * sizeof (int));
if (!data)
perror("realloc failed), exit(EXIT_FAILURE);
}
data [n-1] = x;
}
but that is very inefficient, you should call realloc only once in a while. You could for instance have
data_length = 3*data_length/2 + 5;
*ptr = data = realloc(oldata, data_length * sizeof (int));
Let's take a look at the POSIX realloc specification.
The description says:
If the new size of the memory object would require movement of the object, the space for the previous instantiation of the object is freed.
The return value (emphasis added) mentions:
Upon successful completion with a size not equal to 0, realloc() returns a pointer to the (possibly moved) allocated space.
You can check to see if the pointer changes.
int *old;
old = *test;
*test = realloc(*test, data_length * sizeof(int));
if (*test != old)
printf("Pointer changed from %p to %p\n", old, *test);
This possible change can interact badly because your code refers to the "same" memory by two different names, data and *test. If *test changes, data still points to the old chunk of memory.