I'm trying to write a function that uses realloc() to extend the array as pointed to within in instance of a struct, however I can't seem to get it to work.
The relevant part of my code is:
struct data_t {
int data_size;
uint16_t *data;
};
void extend_data(data_t container, uint16_t value) {
// adds an additional uint16_t to the array of DATA, updates its internal
// variables, and initialises the new uint to VALUE.
int len_data = sizeof(*(container->data)) / sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
container->data = realloc(container->data, sizeof(*(container->data))+sizeof(uint16_t));
container->data_size++;
container->data[container->data_size-1] = value;
len_data = sizeof(*(container->data)) / sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
printf("data_size: %d\n", container->data_size);
return;
}
Can anybody see what the problem is with this?
Edit
As R. Sahu points out, container is not a pointer in this function - when you said the code "wasn't working", I assumed you meant that you weren't growing your array, but what you've written here won't even compile.
Are you sure you've copied this code correctly? If so, does "not working" mean you're getting a compile-time error, a run-time error, or just unexpected output?
If you've copied the code as written, then the first thing you need to do is change the function prototype to
void extend_data(data_t *container, uint16_t value) {
and make sure you're passing a pointer to your data_t type, otherwise the update won't be reflected in calling code.
Original
In the line
container->data = realloc(container->data, sizeof(*(container->data))+sizeof(uint16_t));
sizeof(*(container->data)) evaluates to sizeof (uint16_t). container->data is a pointer to, not an array of, uint16_t; sizeof will give you the size of the pointer object, not the number of elements you've allocated. What you want to do is something like the following:
/**
* Don't assign the result of a realloc call back to the original
* pointer - if the call fails, realloc will return NULL and you'll
* lose the reference to your original buffer. Assign the result to
* a temporary, then after making sure the temporary is not NULL,
* assign that back to your original pointer.
*/
uint16_t *tmp = realloc(container-data, sizeof *container->data * (container->data_size + 1) );
if ( tmp )
{
/**
* Only add to container->data and update the value of container->data_size
* if the realloc call succeeded.
*/
container->data = tmp;
container->data[container->data_size++] = value;
}
You don't calculate the new size correctly. Consider this:
typedef struct {
size_t size;
int *data;
} int_array;
#define INT_ARRAY_INIT { 0, NULL}
void int_array_resize(int_array *const array,
const size_t newsize)
{
if (!array) {
fprintf(stderr, "int_array_resize(): NULL int_array.\n");
exit(EXIT_FAILURE);
}
if (!newsize) {
free(array->data);
array->data = 0;
array->size = 0;
} else
if (newsize != array->size) {
void *temp;
temp = realloc(array->data, newsize * sizeof array->data[0]);
if (!temp) {
fprintf(stderr, "int_array_resize(): Out of memory.\n");
exit(EXIT_FAILURE);
}
array->data = temp;
array->size = newsize;
}
}
/* int_array my_array = INT_ARRAY_INIT;
is equivalent to
int_array my_array;
int_array_init(&my_array);
*/
void int_array_init(int_array *const array)
{
if (array) {
array->size = 0;
array->data = NULL;
}
}
void int_array_free(int_array *const array)
{
if (array) {
free(array->data);
array->size = 0;
array->data = NULL;
}
}
The key point is newsize * sizeof array->data[0]. This is the number of chars needed for newsize elements of whatever type array->data[0] has. Both malloc() and realloc() take the size in chars.
If you initialize new structures of that type using int_array my_array = INT_ARRAY_INIT; you can just call int_array_resize() to resize it. (realloc(NULL, size) is equivalent to malloc(size); free(NULL) is safe and does nothing.)
The int_array_init() and int_array_free() are just helper functions to initialize and free such arrays.
Personally, whenever I have dynamically resized arrays, I keep both the allocated size (size) and the size used (used):
typedef struct {
size_t size; /* Number of elements allocated for */
size_t used; /* Number of elements used */
int *data;
} int_array;
#define INT_ARRAY_INIT { 0, 0, NULL }
A function that ensures there are at least need elements that can be added is then particularly useful. To avoid unnecessary reallocations, the function implements a policy that calculates the new size to allocate for, as a balance between amount of memory "wasted" (allocated but not used) and number of potentially slow realloc() calls:
void int_array_need(int_array *const array,
const size_t need)
{
size_t size;
void *data;
if (!array) {
fprintf(stderr, "int_array_need(): NULL int_array.\n");
exit(EXIT_FAILURE);
}
/* Large enough already? */
if (array->size >= array->used + need)
return;
/* Start with the minimum size. */
size = array->used + need;
/* Apply growth/reallocation policy. This is mine. */
if (size < 256)
size = (size | 15) + 1;
else
if (size < 2097152)
size = (3 * size) / 2;
else
size = (size | 1048575) + 1048577 - 8;
/* TODO: Verify (size * sizeof array->data[0]) does not overflow. */
data = realloc(array->data, size * sizeof array->data[0]);
if (!data) {
/* Fallback: Try minimum allocation. */
size = array->used + need;
data = realloc(array->data, size * sizeof array->data[0]);
}
if (!data) {
fprintf(stderr, "int_array_need(): Out of memory.\n");
exit(EXIT_FAILURE);
}
array->data = data;
array->size = size;
}
There are many opinions on what kind of reallocation policy you should use, but it really depends on the use case.
There are three things in the balance: number of realloc() calls, as they might be "slow"; memory fragmentation if different arrays are grown requiring many realloc() calls; and amount of memory allocated but not used.
My policy above tries to do many things at once. For small allocations (up to 256 elements), it rounds the size up to the next multiple of 16. That is my attempt at a good balance between memory used for small arrays, and not very many realloc() calls.
For larger allocations, 50% is added to the size. This reduces the number of realloc() calls, while keeping the allocated but unused/unneeded memory below 50%.
For really large allocations, when you have 221 elements or more, the size is rounded up to the next multiple of 220, less a few elements. This caps the number of allocated but unused elements to about 221, or two million elements.
(Why less a few elements? Because it does not harm on any systems, and on certain systems it may help a lot. Some systems, including x86-64 (64-bit Intel/AMD) on certain operating systems and configurations, support large ("huge") pages that can be more efficient in some ways than normal pages. If they are used to satisfy an allocation, I want to avoid the case where an extra large page is allocated just to cater for the few bytes the C library needs internally for the allocation metadata.)
It appears you aren't using sizeof correctly. In your struct you've defined a uint16_t pointer, not an array. The size of the uint16_t* data type is the size of a pointer on your system. You need to store the size of the allocated memory along with the pointer if you want to be able to accurately resize it. It appears you already have a field for this with data_size. Your example might be able to be fixed as,
// I was unsure of the typedef-ing happening with data_t so I made it more explicit in this example
typedef struct {
int data_size;
uint16_t* data;
} data_t;
void extend_data(data_t* container, uint16_t value) {
// adds an additional uint16_t to the array of DATA, updates its internal
// variables, and initialises the new uint to VALUE.
// CURRENT LENGTH OF DATA
int len_data = container->data_size * sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
uint16_t* tmp = realloc(container->data, (container->data_size + 1) * sizeof(uint16_t));
if (tmp) {
// realloc could fail and return false.
// If this is not handled it could overwrite the pointer in `container` and cause a memory leak
container->data = tmp;
container->data_size++;
container->data[container->data_size-1] = value;
} else {
// Handle allocation failure
}
len_data = container->data_size * sizeof(uint16_t);
printf("LENGTH OF DATA: %d\n", len_data);
printf("data_size: %d\n", container->data_size);
return;
}
void extend_data(data_t container, ...
In your function container is not the pointer but the struct itself passed by the value so you cant use the -> operator.
The realloced memory will be lost as you work on the local copy of the passed strucure and it will be lost on the function return.
sizeof(*(container.data)) / sizeof(uint16_t)
it will be always 1 as the *(uint16_t *) / sizeof(uint16_t) is always one.
Why: data member is pointer to the uint16_t. *data has the type of uint16_t
sizeof is calculated during the compilation not the runtime and it does not return the ammount of memory allocated by the malloc.
Related
I have code like so
#ifndef hashtable
#define hashtable
// define the maxmium size
#define INITIAL_SIZE 5
#define LOAD_FACTOR 0.7
typedef struct hashtable
{
int keyArray[INITIAL_SIZE];
// 1d array of strings with maximum length 100 (plus '\0 character')
char valueArray[INITIAL_SIZE][100 + 1];
bool isActiveArray[INITIAL_SIZE]; // for deleting elements
int count;
int capacity;
double loadFactor;
// true: linear probing, false: quadratic probing
bool collisionHandler;
} table;
#endif
in hashtable.h file
in which I define a hashtable with a key array and value array and so on.
I am confused on how I could resize the hashtable, because whenever creating a new struct in order to resize, I fall into the problem that my INITIAL_SIZE cannot be changed, especially in a #define statement, although I want to make a new table that would have a capacity of 2*INITIAL_SIZE and so on ...
Here is my code of initTable() where I make the table incase it is helpful
void initTable(table* p, int size, double loadFactor, bool collisionHandler) {
// constructor
p->count = 0;
p->capacity = size;
p->loadFactor = loadFactor;
p->collisionHandler = collisionHandler;
memset( p->keyArray, 0, sizeof p->keyArray );
memset( p->valueArray, 0, sizeof p->valueArray );
memset( p->isActiveArray, 0, sizeof p->isActiveArray );
}
How can I resize the array, open to any suggestions even if removing INITIAL_SIZE entirely
Thanks for the help,
pew
Macros (defines) are not variables, they are replaced by their values before the compilation, so in your code all the INITIAL_SIZE will be replaced by 5.
When you declare a variable in your code, it will be statically allocated on the stack and its size is constant. you cant change the size of an array in a structure so you need to have the arrays outside of the structure and pointers that points to it like that:
typedef struct hashtable
{
int *keyArray;
char *valueArray;
bool *isActiveArray;
int count;
int capacity;
double loadFactor;
bool collisionHandler;
} table;
Then you will need to dynamically allocate the arrays with malloc. The malloc function take in argument the size that you want to allocate and return a pointer to the allocated area (on the heap).
for example:
table p;
int size = 5;
p.keyArray = malloc(size * sizeof(int)); // sizeof is a c operator that return the size of a type in bytes
if (p.keyArray == NULL) // it is a good practice to protect a malloc by checking its return value
exit(1);
//p.keyArray now points to an array of size 5, you can now use it like any other array
When you dont need the array anymore you must free the allocated memory with the free function
free(p.keyArray);
finally here is the full code to perform the resizing of an array:
void resize(int **array, int old_size, int new_size)
{
int *new_array = malloc(new_size * sizeof(int)); // allocate the new area
if (!new_array) // protect the malloc
exit(1);
memcpy(new_array, *array, old_size * sizeof(int)); // copy the content from the old area to the new one
free(*array); // free the old area
*array = new_array; // and change the pointer of the old area
}
or with realloc:
void resize(int **array, int new_size)
{
*array = realloc(*array, new_size * sizeof(int));
if (!*array)
exit(1);
}
edit:
as mentioned by Neil the realloc method is better because you let malloc do internal optimization (in case the area next to the initial area is sufficient it will just expand this area and this avoid to copy the content of the array). I just wanted to show you the malloc logic with the first version.
#jgiron42's answer is almost perfect using malloc to reallocate and using integer, char, and bool pointers for lists in hashtable.h, in my code/problem the only way to make it fully work was when copying a new key array of int* it needs to be done like so - > memcpy(new_keys, *keys, old_size * sizeof(int)); instead of what works for char and bool arrays memcpy(new_bools, *bools, old_size);
I am trying to write a set of functions that will support a dynamically allocated array where a struct contains the array and other metadata. The goal is to return the function to the user, and the struct information can be called from a function. The code seems to work just fine until I get to the function to free the memory from heap. For reasons I do not understand, the code fails with a segmentation fault, which would indicate that the variable vec in the free_vector function is not pointing to the correct address. However, I have verified with print statements that it is pointing to the correct address. I am hoping someone can help me understand why the free_vector function is not working, specifically the free command. My code and implementation is shown below.
typedef struct
{
size_t allocated_length;
size_t active_length;
size_t num_bytes;
char *vector;
} Vector;
void *init_vector(size_t num_indices, size_t num_bytes) {
// Allocate memory for Vector struct
Vector *vec = malloc(sizeof(*vec));
vec->active_length = 0;
vec->num_bytes = num_bytes;
// Allocate heap memory for vector
void *ptr = malloc(num_bytes * num_indices);
if (ptr == NULL) {
printf("WARNING: Unable to allocate memory, exiting!\n");
return &vec->vector;
}
vec->allocated_length = num_indices;
vec->vector = ptr;
return &vec->vector;
}
// --------------------------------------------------------------------------------
int push_vector(void *vec, void *elements, size_t num_indices) {
Vector *a = get_vector_data(vec);
if(a->active_length + num_indices > a->allocated_length) {
printf("TRUE\n");
size_t size = (a->allocated_length + num_indices) * 2;
void *ptr = realloc(a->vector, size * a->num_bytes);
if (ptr == NULL) {
printf("WARNING: Unable to allocate memory, exiting!\n");
return 0;
}
a->vector = ptr;
a->allocated_length = size;
}
memcpy((char *)vec + a->active_length * a->num_bytes, elements,
num_indices * a->num_bytes);
a->active_length += num_indices;
return 1;
}
// --------------------------------------------------------------------------------
Vector *get_vector_data(void *vec) {
// - The Vector struct has three size_t variables that proceed the vector
// variable. These variables consume 24 bytes of daya. THe code below
// points backwards in memory by 24 bytes to the beginning of the Struct.
char *a = (char *)vec - 24;
return (Vector *)a;
}
// --------------------------------------------------------------------------------
void free_vector(void *vec) {
// Free all Vector struct elements
Vector *a = get_vector_data(vec);
// - This print statement shows that the variable is pointing to the
// correct data.
printf("%d\n" ((int *)vec)[2]);
// The function fails on the next line and I do not know why
free(a->vector);
a->vector = NULL;
a->allocated_length = 0;
a->active_length = 0;
a->num_bytes = 0;
}
int main() {
int *a = init_vector(3, sizeof(int));
int b[3] = {1, 2, 3};
push_vector(a, b, 3);
// The code begins to fails here
free_vector(a);
}
This program suffers from Undefined Behaviour.
The return value from init_vector is of type char **, a pointer-to-pointer-to-char,
return &vec->vector;
converted to void *.
In main, this value is converted to an int *
int *a = init_vector(3, sizeof(int));
This value is then converted back into a void * when passed to push_vector.
In push_vector, this value is cast to a char * in order to perform pointer arithmetic
memcpy((char *)vec + a->active_length * a->num_bytes, elements,
num_indices * a->num_bytes);
where this operation overwrites the original pointer returned by malloc contained in the vector member.
On my system, this attempts to write 12 bytes (three int) to memory starting with the position of the vector member in the Vector structure.
Vector *vec
| &vec->vector
| |
v v
+------+------+------+------+-----+
|size_t|size_t|size_t|char *|?????|
+------+------+------+------+-----+
This overflows, as sizeof (char *) is 8 on my system.
This is the wrong place to write data. The correct place to write data is *(char **) vec - or just a->vector.
If the write does not crash the program directly (UB), this surely results in free being passed a pointer value that was not returned by malloc, calloc, or realloc, or the pointer value NULL.
Aside: In free_vector, this value is also cast to an int *
printf("%d\n", ((int *)vec)[2]); /* added a missing semi-colon. */
Additionally, it is unclear if free_vector should free the original allocation, or just the vector member. You do go to lengths to zero-out the structure here.
Still, as is, you have a memory leak - albeit a small one.
void free_vector(void *vec) {
Vector *a = get_vector_data(vec);
/* ... */
free(a); /* This has to happen at some point. */
}
Note, you should be using offsetof to calculate the position of members within a structure. A static offset of 24 assumes two thing that may not hold true:
sizeof (size_t) is always 8 (actual minimum sizeof (size_t) is 2), and
the structure contains no padding to satisfy alignment (this seems likely given the form, but not strictly true).
The source you linked in the comments uses a flexible array member, not a pointer member, meaning the entirety of the data (allocation sizes and the vector) is stored in contiguous memory. That is why the & operator yields a valid location to copy data to in this implementation.
(Aside: the linked implementation appears to be broken by effectively using sizeof to get the base of the container structure from a pointer to the flexible array member (e.g., &((vector_container *) pointer_to_flexible_member)[-1]), which does not take into account the possibility of trailing padding, which would result in a larger offset than expected.)
My current concat function:
char* concat(char* a, int a_size,
char* b, int b_size) {
char* c = malloc(a_size + b_size);
memcpy(c, a, a_size);
memcpy(c + a_size, b, b_size);
free(a);
free(b);
return c;
}
But this used extra memory. Is it possible to append two byte arrays using realloc without making extra memory space?
Like:
void append(char* a, int a_size, char* b, int b_size)
...
char* a = malloc(2);
char* b = malloc(2);
void append(a, 2, b, 2);
//The size of a will be 4.
While Jean-François Fabre answered the stated question, I'd like to point out that you can manage such byte arrays better by using a structure:
typedef struct {
size_t max; /* Number of chars allocated for */
size_t len; /* Number of chars in use */
unsigned char *data;
} bytearray;
#define BYTEARRAY_INIT { 0, 0, NULL }
void bytearray_init(bytearray *barray)
{
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
void bytearray_free(bytearray *barray)
{
free(barray->data);
barray->max = 0;
barray->len = 0;
barray->data = NULL;
}
To declare an empty byte array, you can use either bytearray myba = BYTEARRAY_INIT; or bytearray myba; bytearray_init(&myba);. The two are equivalent.
When you no longer need the array, call bytearray_free(&myba);. Note that free(NULL) is safe and does nothing, so it is perfectly safe to free a bytearray that you have initialized, but not used.
To append to a bytearray:
int bytearray_append(bytearray *barray, const void *from, const size_t size)
{
if (barray->len + size > barray->max) {
const size_t len = barray->len + size;
size_t max;
void *data;
/* Example policy: */
if (len < 8)
max = 8; /* At least 8 chars, */
else
if (len < 4194304)
max = (3*len) / 2; /* grow by 50% up to 4,194,304 bytes, */
else
max = (len | 2097151) + 2097153 - 24; /* then pad to next multiple of 2,097,152 sans 24 bytes. */
data = realloc(barray->data, max);
if (!data) {
/* Not enough memory available. Old data is still valid. */
return -1;
}
barray->max = max;
barray->data = data;
}
/* Copy appended data; we know there is room now. */
memmove(barray->data + barray->len, from, size);
barray->len += size;
return 0;
}
Since this function can at least theoretically fail to reallocate memory, it will return 0 if successful, and nonzero if it cannot reallocate enough memory.
There is no need for a malloc() call, because realloc(NULL, size) is exactly equivalent to malloc(size).
The "growth policy" is a very debatable issue. You can just make max = barray->len + size, and be done with it. However, dynamic memory management functions are relatively slow, so in practice, we don't want to call realloc() for every small little addition.
The above policy tries to do something better, but not too aggressive: it always allocates at least 8 characters, even if less is needed. Up to 4,194,304 characters, it allocates 50% extra. Above that, it rounds the allocation size to the next multiple of 2,097,152 and substracts 24. The reasoning behid this is complex, but it is more for illustration and understanding than anything else; it is definitely NOT "this is best, and this is what you should do too". This policy ensures that each byte array allocates at most 4,194,304 = 222 unused characters. However, 2,097,152 = 221 is the size of a huge page on AMD64 (x86-64), and is a power-of-two multiple of a native page size on basically all architectures. It is also large enough to switch from so-called sbrk() allocation to memory mapping on basically all architectures that do that. It means that such huge allocations use a separate part of the heap for each, and the unused part is usually just virtual memory, not necessarily backed by any RAM, until accessed. As a result, this policy tends to work quite well for both very short byte arrays, and very long byte arrays, on most architectures.
Of course, if you know (or measure!) the typical size of the byte arrays in typical workloads, you can optimize the growth policy for that, and get even better results.
Finally, it uses memmove() instead of memcpy(), just in case someone wishes to repeat a part of the same byte array: memcpy() only works if the source and target areas do not overlap; memmove() works even in that case.
When using more advanced data structures, like hash tables, a variant of the above structure is often useful. (That is, this is much better in cases where you have lots of empty byte arrays.)
Instead of having a pointer to the data, the data is part of the structure itself, as a C99 flexible array member:
typedef struct {
size_t max;
size_t len;
unsigned char data[];
} bytearray;
You cannot declare a byte array itself (i.e. bytearray myba; will not work); you always declare a pointer to a such byte arrays: bytearray *myba = NULL;. The pointer being NULL is just treated the same as an empty byte array.
In particular, to see how many data items such an array has, you use an accessor function (also defined in the same header file as the data structure), rather than myba.len:
static inline size_t bytearray_len(bytearray *const barray)
{
return (barray) ? barray->len : 0;
}
static inline size_t bytearray_max(bytearray *const barray)
{
return (barray) ? barray->max : 0;
}
The (expression) ? (if-true) : (if-false) is a ternary operator. In this case, the first function is exactly equivalent to
static inline size_t bytearray_len(bytearray *const barray)
{
if (barray)
return barray->len;
else
return 0;
}
If you wonder about the bytearray *const barray, remember that pointer declarations are read from right to left, with * as "a pointer to". So, it just means that barray is constant, a pointer to a byte array. That is, we may change the data it points to, but we won't change the pointer itself. Compilers can usually detect such stuff themselves, but it may help; the main point is however to remind us human programmers that the pointer itself is not to be changed. (Such changes would only be visible within the function itself.)
Since such arrays often need to be resized, the resizing is often put into a separate helper function:
bytearray *bytearray_resize(bytearray *const barray, const size_t len)
{
bytearray *temp;
if (!len) {
free(barray);
errno = 0;
return NULL;
}
if (!barray) {
temp = malloc(sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
errno = ENOMEM;
return NULL;
}
temp->max = len;
temp->len = 0;
return temp;
}
if (barray->len > len)
barray->len = len;
if (barray->max == len)
return barray;
temp = realloc(barray, sizeof (bytearray) + len * sizeof barray->data[0]);
if (!temp) {
free(barray);
errno = ENOMEM;
return NULL;
}
temp->max = len;
return temp;
}
What does that errno = 0 do in there? The idea is that because resizing/reallocating a byte array may change the pointer, we return the new one. If the allocation fails, we return NULL with errno == ENOMEM, just like malloc()/realloc() do. However, since the desired new length was zero, this saves memory by freeing the old byte array if any, and returns NULL. But since that is not an error, we set errno to zero, so that it is easier for callers to check if an error occurred or not. (If the function returns NULL, check errno. If errno is nonzero, an error occurred; you can use strerror(errno) to get a descriptive error message.)
You probably also noted the sizeof barray->data[0], used even when barray is NULL. This is okay, because sizeof is not a function, but an operator: it does not access the right side at all, it only evaluates to the size of the thing the right side refers to. (You only need to use parentheses when the right size is a type.) This form is nice, because it lets a programmer change the type of the data member, without changing any other code.
To append data to such a byte array, we probably want to be able to specify whether we anticipate further appends to the same array, or whether this is probably the final append, so that only the exact needed amount of memory is needed. For simplicity, I'll only implement the exact size version here. Note that this function returns a pointer to the (modified) byte array:
bytearray *bytearray_append(bytearray *barray,
const void *from, const size_t size,
int exact)
{
size_t len = bytearray_len(barray) + size;
if (exact) {
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by bytearray_resize(). */
} else
if (bytearray_max(barray) < len) {
if (!exact) {
/* Apply growth policy */
if (len < 8)
len = 8;
else
if (len < 4194304)
len = (3 * len) / 2;
else
len = (len | 2097151) + 2097153 - 24;
}
barray = bytearray_resize(barray, len);
if (!barray)
return NULL; /* errno already set by the bytearray_resize() call */
}
if (size) {
memmove(barray->data + barray->len, from, size);
barray->len += size;
}
return barray;
}
This time, we declared bytearray *barray, because we change where barray points to in the function. If the fourth parameter, final, is nonzero, then the resulting byte array is exactly the size needed; otherwise the growth policy is applied.
yes, since realloc will preserve the start of your buffer if the new size is bigger:
char* concat(char* a, size_t a_size,
char* b, size_t b_size) {
char* c = realloc(a, a_size + b_size);
memcpy(c + a_size, b, b_size); // dest is after "a" data, source is b with b_size
free(b);
return c;
}
c may be different from a (if the original memory block cannot be resized in-place contiguously to the new size by the system) but if that's the case, the location pointed by a will be freed (you must not free it), and the original data will be "moved".
My advice is to warn the users of your function that the input buffers must be allocated using malloc, else it will crash badly.
I have a C struct:
typedef struct {
Dataset *datasets;
int nDatasets;
char *group_name;
enum groupType type;
} DatasetGroup;
It has a constructor function like this:
DatasetGroup * new_DatasetGroup(char *group_name, enum groupType type, enum returnCode *ret)
{
DatasetGroup *dg;
dg = (DatasetGroup *) malloc(sizeof(DatasetGroup));
if (dg == NULL)
{
*ret = EMEMORY_ERROR;
}
// Allocate space for a few datasets
dg->datasets = malloc(sizeof(Dataset) * INCREMENT);
if (dg->datasets == NULL)
{
*ret = EMEMORY_ERROR;
}
dg->group_name= malloc(sizeof(char) * strlen(group_name));
strcpy(dg->group_name, group_name);
dg->type = type;
groupCount++;
return dg;
}
I want to dynamically create an array of these structs. Whats the best way to do this?
So far I have something like:
DatasetGroup * make_array(){
DatasetGroup *dg_array;
// Allocate space for a few groups
dg_array = (DatasetGroup *) malloc(sizeof(DatasetGroup) * INCREMENT);
return dg_array;
}
void add_group_to_array(DatasetGroup *dg_array, ...){
// Add a datasetgroup
DatasetGroup *dg = new_DatasetGroup(...);
// groupCount - 1 as the count is incremented when the group is created, so will always be one ahead of the array index we want to assign to
dg_array[groupCount - 1] = dg;
if (groupCount % INCREMENT == 0)
{
//Grow the array
dg_array = realloc(dg_array, sizeof(DatasetGroup) * (groupCount + INCREMENT));
}
}
But this doesnt seem right....
any ideas?
A few suggestions:
You have groupCount being incremented by the constructor function of the struct. This means you can only have one array of the struct that uses your array function. I would recommend having the array be responsible for managing the count.
To that affect if you want to have a managed array I would create a struct for that and have it keep both the pointer to the array,the number of objects and the size of the array (e.g. the maximum number of structs it can currently hold)
If you keep proper track of how many elements you have and the size of the array you can replace groupCount % INCREMENT == 0 with something like groupCount == arraySize which is a lot more intuitive in my opinion.
You can avoid the second malloc in the constructor all together by having the array be an array of the elements instead of an array of pointers. The constructor than then just initialize the struct members instead of allocating memory. If you are doing this a lot you will be avoiding a lot of memory fragmentation.
Finally, while this depends on your application, I usually recommend when you realloc do not increase by a constant but instead of by a multiple of the current array size. If say you double the array size you only have to do log_2 n number of reallocs with n being the final array size and you waste at most half of memory (memory is generally cheap, like I said it depends on the application). If that is wasting to much memory you can do say 1.5. If you want a more detailed explanation of this I recommend this Joel on Software article, the part about realloc is about 2/3 down.
Update:
A few others things:
dg = (DatasetGroup *) malloc(sizeof(DatasetGroup));
if (dg == NULL)
{
ret = EMEMORY_ERROR;
}
// Allocate space for a few datasets
dg->datasets = malloc(sizeof(Dataset) * INCREMENT);
As previously pointed out is very bad as you will us dg even if it is NULL. You probably want to exit right after detecting the error.
Furthermore you are setting ret but ret is passed by value so it will not be changed for the caller if the callee changes it. Instead you probably want to pass a pointer and dereference it.
Update 2: Can I give an example, sure, quick not so much ;-D.
Consider the following code (I apologize if there are any mistakes, still half asleep):
#include <stdio.h>
#include <stdlib.h>
#define LESS_MALLOCS
#define MAX_COUNT 100000000
typedef struct _foo_t
{
int bar1;
int bar2;
} foo_t;
void foo_init(foo_t *foo, int bar1, int bar2)
{
foo->bar1 = bar1;
foo->bar2 = bar2;
}
foo_t* new_foo(int bar1, int bar2)
{
foo_t *foo = malloc(sizeof(foo_t));
if(foo == NULL) {
return NULL;
}
foo->bar1 = bar1;
foo->bar2 = bar2;
return foo;
}
typedef struct _foo_array_t
{
#ifdef LESS_MALLOCS
foo_t *array;
#else
foo_t **array;
#endif
int count;
int length;
} foo_array_t;
void foo_array_init(foo_array_t* foo_array, int size) {
foo_array->count = 0;
#ifdef LESS_MALLOCS
foo_array->array = malloc(sizeof(foo_t) * size);
#else
foo_array->array = malloc(sizeof(foo_t*) * size);
#endif
foo_array->length = size;
}
int foo_array_add(foo_array_t* foo_array, int bar1, int bar2)
{
if(foo_array->count == foo_array->length) {
#ifdef LESS_MALLOCS
size_t new_size = sizeof(foo_t) * foo_array->length * 2;
#else
size_t new_size = sizeof(foo_t*) * foo_array->length * 2;
#endif
void* tmp = realloc(foo_array->array, new_size);
if(tmp == NULL) {
return -1;
}
foo_array->array = tmp;
foo_array->length *= 2;
}
#ifdef LESS_MALLOCS
foo_init(&(foo_array->array[foo_array->count++]), bar1, bar2);
#else
foo_array->array[foo_array->count] = new_foo(bar1, bar2);
if(foo_array->array[foo_array->count] == NULL) {
return -1;
}
foo_array->count++;
#endif
return foo_array->count;
}
int main()
{
int i;
foo_array_t foo_array;
foo_array_init(&foo_array, 20);
for(i = 0; i < MAX_COUNT; i++) {
if(foo_array_add(&foo_array, i, i+1) != (i+1)) {
fprintf(stderr, "Failed to add element %d\n", i);
return EXIT_FAILURE;
}
}
printf("Added all elements\n");
return EXIT_SUCCESS;
}
There is a struct (foo_t) with two members (bar1 and bar2) and another struct that is an array wrapper (foo_array_t). foo_array_t keeps track of the current size of the array and the number of elements in the array. It has an add element function (foo_array_add). Note that there is a foo_init and a new_foo, foo_init takes a pointer to a foo_t and new_foo does not and instead returns a pointer. So foo_init assumes the memory has been allocated in some way, heap, stack or whatever doesn't matter, while new_foo will allocate memory from the heap. There is also a preprocess macro called LESS_MALLOCS. This changes the definition of the array member of foo_array_t, the size of the initial array allocation, the size during reallocation and whether foo_init or new_foo is used. The array and its size have to change to reflect whether a pointer or the actually element is in the array. With LESS_MACRO defined the code is following my suggestion for number 4, when not, it is more similar to your code. Finally, main contains a simple micro-benchmark. The results are the following:
[missimer#asus-laptop tmp]$ gcc temp.c # Compile with LESS_MACROS defined
[missimer#asus-laptop tmp]$ time ./a.out
Added all elements
real 0m1.747s
user 0m1.384s
sys 0m0.357s
[missimer#asus-laptop tmp]$ gcc temp.c #Compile with LESS_MACROS not defined
[missimer#asus-laptop tmp]$ time ./a.out
Added all elements
real 0m9.360s
user 0m4.804s
sys 0m1.968s
Not that time is the best way to measure a benchmark but in this case I think the results speak for themselves. Also, when you allocate an array of elements instead of an array of pointers and then allocate the elements separately you reduce the number of places you have to check for errors. Of course everything has trade-offs, if for example the struct was very large and you wanted to move elements around in the array you would be doing a lot of memcpy-ing as opposed to just moving a pointer around in your approach.
Also, I would recommend against this:
dg_array = realloc(dg_array, sizeof(DatasetGroup) * (groupCount + INCREMENT));
As you lose the value of the original pointer if realloc fails and returns NULL. Also like your previous ret, you should pass a pointer instead of the value as you are not changing the value to the caller, just the callee which then exits so it has no real affect. Finally, I noticed you changed your function definition to have a pointer to ret but you need to dereference that pointer when you use it, you should be getting compiler warnings (perhaps even errors) when you do try what you currently have.
You could do two things, either you dynamically create an array of struct pointers, then call your new function to create N datagroups, or you could dynamically request memory for N structures at once, this would mean your N structures would be contiguously allocated.
Datagroup **parry = malloc(sizeof(datagroup *) * N)
for (int i = 0; i < N; i++){
parry[i] = //yourconstructor
}
Or
//allocate N empty structures
Datagroup *contarr = calloc(N, sizeof(Datagroup))
The second method might need a different initialization routine than your constructor, as the memory is already allocated
It is more than a funny question. :-)
I wish to initialize an array in C, but instead of zeroing out the array with calloc. I want to set all element to one. Is there a single function that does just that?
I have used my question above to search in google, no answer. Hope you can help me out! FYI, I am first year CS student just starting to program in C.
There isn't a standard C memory allocation function that allows you to specify a value other than 0 that the allocated memory is initialized to.
You could easily enough write a cover function to do the job:
void *set_alloc(size_t nbytes, char value)
{
void *space = malloc(nbytes);
if (space != 0)
memset(space, value, nbytes);
return space;
}
Note that this assumes you want to set each byte to the same value. If you have a more complex initialization requirement, you'll need a more complex function. For example:
void *set_alloc2(size_t nelems, size_t elemsize, void *initializer)
{
void *space = malloc(nelems * elemsize);
if (space != 0)
{
for (size_t i = 0; i < nelems; i++)
memmove((char *)space + i * elemsize, initializer, elemsize);
}
return space;
}
Example usage:
struct Anonymous
{
double d;
int i;
short s;
char t[2];
};
struct Anonymous a = { 3.14159, 23, -19, "A" };
struct Anonymous *b = set_alloc2(20, sizeof(struct Anonymous), &a);
memset is there for you:
memset(array, value, length);
There is no such function. You can implement it yourself with a combination of malloc() and either memset() (for character data) or a for loop (for other integer data).
The impetus for the calloc() function's existence (vs. malloc() + memset()) is that it can be a nice performance optimization in some cases. If you're allocating a lot of data, the OS might be able to give you a range of virtual addresses that are already initialized to zero, which saves you the extra cost of manually writing out 0's into that memory range. This can be a large performance gain because you don't need to page all of those pages in until you actually use them.
Under the hood, calloc() might look something like this:
void *calloc(size_t count, size_t size)
{
// Error checking omitted for expository purposes
size_t total_size = count * size;
if (total_size < SOME_THRESHOLD) // e.g. the OS's page size (typically 4 KB)
{
// For small allocations, allocate from normal malloc pool
void *mem = malloc(total_size);
memset(mem, 0, total_size);
return mem;
}
else
{
// For large allocations, allocate directory from the OS, already zeroed (!)
return mmap(NULL, total_size, PROT_READ|PROT_WRITE, MAP_ANON|MAP_PRIVATE, -1, 0);
// Or on Windows, use VirtualAlloc()
}
}