C malloc use case - realloc versus pre-computation - c

I want to create an array of structs on the heap from another data structure. Say there are N total elements to traverse, and (N-x) pointers (computed_elements) will be added to the array.
My naive strategy for this is to create an array (temp_array) size N on the stack and traverse the data structure, keeping track of how many elements need to be added to the array, and adding them to temp_array when I encounter them. Once I've finished, I malloc(computed_elements) and populate this array with the temp_array.
This is suboptimal because the second loop is unnecessary. However, I am weighing this against the tradeoff of constantly reallocating memory every iteration. Some rough code to clarify:
void *temp_array[N];
int count = 0;
for (int i = 0; i < N; i++) {
if (check(arr[i])) {
temp_array[count] = arr[i];
count++;
}
}
void *results = malloc(count * sizeof(MyStruct));
for (int i = 0; i < count; i++) {
results[i] = temp_array[i];
}
return results;
Thoughts would be appreciated.

One common strategy is to try to estimate the number of elements you're going to need (not a close estimate, more of a "On the order of..." type estimate). Malloc that amount of memory, and when you get "close" to that limit ("close" also being up for interpretation), realloc some more. Personally, I typically double the array when I get close to filling it.
-EDIT-
Here is the "ten minute version". (I've ensured that it builds and doesn't segfault)
Obviously I've omitted things like checking for the success of malloc/realloc, zeroing memory, etc...
#include <stdlib.h>
#include <stdbool.h>
#include <string.h> /* for the "malloc only version" (see below) */
/* Assume 500 elements estimated*/
#define ESTIMATED_NUMBER_OF_RECORDS 500
/* "MAX" number of elements that the original question seems to be bound by */
#define N 10000
/* Included only to allow for test compilation */
typedef struct
{
int foo;
int bar;
} MyStruct;
/* Included only to allow for test compilation */
MyStruct arr[N] = { 0 };
/* Included only to allow for test compilation */
bool check(MyStruct valueToCheck)
{
bool baz = true;
/* ... */
return baz;
}
int main (int argc, char** argv)
{
int idx = 0;
int actualRecordCount = 0;
int allocatedSize = 0;
MyStruct *tempPointer = NULL;
MyStruct *results = malloc(ESTIMATED_NUMBER_OF_RECORDS * sizeof(MyStruct));
allocatedSize = ESTIMATED_NUMBER_OF_RECORDS;
for (idx = 0; idx < N; idx++)
{
/* Ensure that we're not about to walk off the current array */
if (actualRecordCount == (allocatedSize))
{
allocatedSize *= 2;
/* "malloc only version"
* If you want to avoid realloc and just malloc everything...
*/
/*
tempPointer = malloc(allocatedSize);
memcpy(tempPointer, results, allocatedSize);
free(results);
results = tempPointer;
*/
/* Using realloc... */
tempPointer = realloc(results, allocatedSize);
results = tempPointer;
}
/* Check validity or original array element */
if (check(arr[idx]))
{
results[actualRecordCount] = arr[idx];
actualRecordCount++;
}
}
if (results != NULL)
{
free(results);
}
return 0;
}

One possibility is malloc for the size N, then run your loop, then realloc for size N-x. Memory fragmentation may result for small x.

The best re-usable code very often is scalable. Unless obligated to use small sizes, assume code will grow in subsequent applications and need to be reasonable efficient for large N. You do want to re-use good code.
realloc() an array by a factor of 4 or so as the need arises. Arrays may also shrink - no need to have a bloated array laying around. I've used grow by factor of 4 at intervals 1,4,16,64... and shrink intervals at 2,8,32,128... By having grow/shrink intervals apart from each other, it avoid lots of actively should N waver around an interval.
Even in small ways, like using size_t vs. int. Sure, with sizes like 1000, it makes no difference, but with code re-use, an application may push the limit: size_t is better for array indexes.
void *temp_array[N];
for (int i = 0; i < N; i++) {
void *temp_array[N];
for (size_t i = 0; i < N; i++) {

Related

Error with `realloc()` with arraylist-like struct with two arrays

As part of a personal project, I'm trying to create a dynamic array of 2-tuples that show a) the line in a program and b) the number of bytecode tokens associated with that line. I've implemented this as a struct of arrays:
typedef struct{
int count; // Number of elements
int capacity; // total capacity of arraylist
int* lines;
int* lineCount;
}
this is based on the example from the codebase, as such:
int count;
int capacity;
uint8_t* bytes;
My problem comes from re-allocation - I have several helper functions/macros for growing and re-allocating the array lists memory - here particularly the macro GROW_ARRAY and reallocate(), as described below. When I try and re-allocate lines, it works fine, but I get a segmentation fault and realloc(): invalid old size several times when I attempt to reallocate lineCount after it
I'm using the code base from Bob Nystrom's Crafting Interpreters, especially this first part here https://craftinginterpreters.com/chunks-of-bytecode.html#challenges. Most of the code comes from there, albeit tinkered with some of having added
Mostly, I've added a lot of checks and been running this with all the debug features in gcc I can find. Notably, realloc(): invalid old size has stop appearing as I've tinkered with the code some.
EDIT: Added main function that should reproduce behavior
int main() {
LineArray lines;
// Initialize to 0 / NULL
initLineArray(&lines);
updateLineArray(&lines, 0, 1);
}
// the structure I'm using
typedef struct {
int capacity;
int count;
int* lines;
int* lineCount;
} LineArray;
/* Note LineArray has already been initialized earlier with
capacity=0;
count=0;
lines=NULL;
lineCount=NULL;
*/
void updateLineArray(LineArray* array, int line, int count) {
// IF line in `lines` -- update it
int index = containsLine(array, line);
if (index != -1) { // IF Index is not Error Code
// I think I fixed a bug here?
array->lineCount[index] += count;
return;
}
//ELSE -- add line to end (naturally appends); then increment
else {
//Check to see if array would be overgrown
if (array->capacity < array->count + 1) {
//IF yes, regrow array
int old_capacity = array->capacity;
array->capacity = GROW_CAPACITY(old_capacity);
// Reallocate arrays.
array->lines = GROW_ARRAY(array->lines, int, old_capacity,
array->capacity);
array->lineCount = GROW_ARRAY(array->lineCount, int, old_capacity,
array->capacity);
}
// Properly update the lines
array->lines[array->count] = line;
array->lineCount[array->count] = count;
array->count++;
return;
}
}
//The memory management functions/macros I'm using here
#define GROW_CAPACITY(capacity) \
((capacity) < 8 ? 8 : (capacity) * 2)
#define GROW_ARRAY(previous, type, oldCount, count) \
(type*) reallocate(previous, sizeof(type) * (oldCount), \
sizeof(type) * (count))
void* reallocate(void* previous, size_t oldSize, size_t newSize) {
// If size is null, erase it and get null_pointer
if (newSize == 0) {
free(previous);
return NULL;
}
// reallocate the data into a new size
// is Oldsize is zero :: malloc(data, newSize)
return realloc(previous, newSize);
}

dynamically allocate space for varying size structures array

I'm facing a problem that seems complicated to me so i'll be very grateful to who can help me.
I'm sort of trying to replicate the array of string behavior only with structures
so instead of having for example
char **mys={"hello","this is a long message"};
I'm trying to have an array with structures (each structure containing an array of variable length). I'm not quite sure if I exposed my problem very well so i hope the code is going to indicate what i'm trying to do:
my two structures :
typedef struct
{
int *iTab;
int iTail;
}strDynArr;
typedef struct
{
strDynArr **iTab;
int iTailStruct;
int iTail;
}strDynDynArr;
All the functions related to those two structures :
void initArray(strDynArr *ar)
{
ar->iTab=malloc(sizeof(int));
ar->iTail=1;
}
void pushArray(int iElement,strDynArr *ar)
{
ar->iTab[ar->iTail-1]=iElement;
realloc(ar->iTab,(ar->iTail++)*sizeof(int));
}
void freeDynArray(strDynArr *ar)
{
free(*ar->iTab);
}
void initDynDynArray(strDynDynArr *ar)
{
ar->iTab=malloc(sizeof(int));
ar->iTail=1;
ar->iTailStruct=1;
}
//the problem
void pushDynDynArray(strDynArr *daElement,strDynDynArr *ar)
{
//ar->iTab[ar->iTail-1]=daElement;
ar->iTailStruct+=(daElement->iTail+1);
realloc(ar->iTab,(ar->iTailStruct)*sizeof(int));
ar->iTab[ar->iTailStruct-(daElement->iTail+1)]=daElement;
//realloc(ar->iTab,(ar->iTail++)*sizeof(int));
}
void freeDynDynDynArray(strDynDynArr *ar)
{
free(*ar->iTab);
}
And the one function where i'm stuck is pushDynDynArray so far i've attempted two things either just using the pointer that points on à structure strDynArr but i don't know how space is managed at all so i tryed to allocate the size of the all the structures contained in the array of the structure strDynDynArr.
So what i would like to know is how to allocate space for my array (iTab) for a structure of type strDynDynArr.
In other words what is the way for me to store several structures containing for example :
//content of structure 1
int *iTab={2,3,4};
int iTail=3;
//content of structure 2
int *iTab={5,8,9,10,54,8,2};
int iTail=7;
All help is welcome and thanks à lot for it !
For simplicity, lets call each array of "strings" a table (of arrays), and each "string" an array of items. In your case, the item type seems to be int:
typedef int item;
typedef struct {
size_t size; /* Number of items allocated for */
size_t used; /* Number of items in item[] */
item *item;
} array;
typedef struct {
size_t size; /* Number of array (pointers) allocated for */
size_t used; /* Number of arrays in array[] */
array *array;
} table;
It is common to define initializer macros and functions for such types:
#define ARRAY_INIT { 0, 0, NULL }
#define TABLE_INIT { 0, 0, NULL }
static inline void array_init(array *ref)
{
if (!ref) {
fprintf(stderr, "array_init(): NULL array!\n");
exit(EXIT_FAILURE);
}
ref->size = 0;
ref->used = 0;
ref->item = NULL;
}
static inline void table_init(table *ref)
{
if (!ref) {
fprintf(stderr, "table_init(): NULL table!\n");
exit(EXIT_FAILURE);
}
ref->size = 0;
ref->used = 0;
ref->array = NULL;
}
Also, free functions are obviously useful:
static inline void array_free(array *ref)
{
if (!ref) {
fprintf(stderr, "array_free(): NULL array!\n");
exit(EXIT_FAILURE);
}
free(ref->item); /* Note: free(NULL) is safe; it does nothing. */
ref->size = 0;
ref->used = 0;
ref->item = NULL;
}
static inline void table_free(table *ref)
{
size_t i;
if (!ref) {
fprintf(stderr, "table_free(): NULL table!\n");
exit(EXIT_FAILURE);
}
i = ref->size;
while (i-->0)
array_free(ref->array + i); /* array_free(&(ref->array[i])); */
free(ref->array);
ref->size = 0;
ref->used = 0;
ref->array = NULL;
}
When freeing a table, we want to free all (possible) arrays individually, then the memory used by the pointers. Note that one could assume that only used arrays are in use; however, I wanted the above to be thorough, and free all size arrays allocated for.
Both the init and free functions above take a pointer to an array or table. They should never be NULL. (That is, the item or array member in the structure may well be NULL; it's just that you should never call e.g. array_init(NULL); or table_free(NULL).)
Let us implement a function to push and pop individual ints from a solitary array (that may or may not be part of a table):
void array_push(array *ref, const item value)
{
if (!ref) {
fprintf(stderr, "array_push(): NULL array!\n");
exit(EXIT_FAILURE);
}
/* Need to grow the array? */
if (ref->used >= ref->size) {
size_t size; /* Minimum is ref->used + 1 */
void *temp;
/* Dynamic growth policy. Pulled from a hat. */
if (ref->used < 64)
size = 64; /* Minimum 64 elements */
else
if (ref->used < 1048576)
size = (3*ref->used) / 2; /* Grow by 50% up to 1048576 */
else
size = (ref->used | 1048575) + 1048577; /* Grow to next multiple of 1048576 */
temp = realloc(ref->item, size * sizeof ref->item[0]);
if (!temp)
return -1; /* Out of memory */
ref->item = temp;
ref->size = size;
}
/* Append value to array. */
ref->item[ref->used++] = value;
}
and a corresponding pop operation:
item array_pop(array *ref)
{
if (!ref) {
fprintf(stderr, "array_pop(): NULL array!\n");
exit(EXIT_FAILURE);
}
if (ref->used < 1) {
fprintf(stderr, "array_pop(): Array is already empty; nothing to pop.\n");
exit(EXIT_FAILURE);
}
/* Since ref->used is the *number* of items in the array,
we want to decrement it first, to get the last item in array. */
return ref->item[--ref->used];
}
In your program, you can use an array for example thus:
array scores = ARRAY_INIT;
array_push(&scores, 5);
array_push(&scores, 2);
printf("%d\n", array_pop(&scores)); /* Will print 2 */
printf("%d\n", array_pop(&scores)); /* Will print 5 */
array_free(&scores);
The line array scores = ARRAY_INIT; both declares and initializes the array (to an empty array). You could also equivalently use array scores; array_init(&scores);.
The resize or growth policy in array_push() is roughly along the lines I'd personally recommend, although the actual numerical values are pulled from a hat, and you may wish to adjust them. The idea is that there is a minimum number of items allocated for (say, 64). For bigger arrays, we increase the size fractionally, so that when the array is large, the size increase is also large. Many people use 100% increase in size (doubling the size, i.e. size = 2 * ref->used;), but I like 50% increase better (multiplying the size by one and one half, size = (3 * ref->used) / 2;). For huge arrays, we don't want to waste potentially lots of memory, so we allocate in fixed-size (but huge) chunks instead. (There is no need or real benefit to align the huge size to some multiple like I did; I just like it that way. And the more complicated code ensures you need to understand it and edit it, rather than submitting it raw as yours; otherwise, you'll instructor will catch you cheating.)
Pushing a single value to the last array in the table is now simple to implement:
void table_push_value(table *ref, const item value)
{
size_t i;
if (!ref) {
fprintf(stderr, "table_push_value(): NULL table!\n");
exit(EXIT_FAILURE);
}
/* Ensure there is at least one array. */
if (!ref->size < 1) {
/* Empty table: ref->size = 0, and ref->used must be 0 too. */
const size_t size = 1; /* Allocate for exactly one array. */
void *temp;
temp = realloc(ref->array, size * sizeof ref->array[0]);
if (!temp) {
fprintf(stderr, "table_push_value(): Out of memory.\n");
exit(EXIT_FAILURE);
}
ref->size = size;
ref->array = temp;
for (i = 0; i < size; i++)
array_init(ref->array + i); /* array_init(&(ref->array[i])); */
}
if (ref->used > 0)
i = ref->used - 1; /* Last array in table */
else
i = 0; /* Table is empty, so use first array */
array_push(ref->array + i, value); /* array_push(&(ref->array[i])); */
}
This time, you need special logic for an empty table, for both allocating the description for an array, as well as where to push.
Pop is simpler:
item table_pop_value(table *ref)
{
size_t i;
if (!ref) {
fprintf(stderr, "table_pop_value(): NULL table!\n");
exit(EXIT_FAILURE);
}
i = ref->used;
/* Find the last array with items in it, and pop from it. */
while (i-- > 0)
if (ref->array[i].used > 0) {
return array_pop(ref->array + i); /* array_pop(&(ref->array[i])); */
fprintf(stderr, "table_pop_value(): Empty table, no items to pop!\n");
exit(EXIT_FAILURE);
}
To push an entire array to a table (pushing the array of items in it, not making a copy of the items) is pretty simple, but we do need to implement a reallocation/growth policy again:
void table_push_array(table *ref, array *one)
{
if (!ref) {
fprintf(stderr, "table_push_array(): NULL table!\n");
exit(EXIT_FAILURE);
}
if (!one) {
fprintf(stderr, "table_push_array(): NULL array!\n");
exit(EXIT_FAILURE);
}
if (ref->used >= ref->size) {
size_t size, i;
void *temp;
if (ref->used < 1)
size = 1; /* Minimum size is 1 */
else
size = (ref->used | 7) + 9; /* Next multiple of 8 */
temp = realloc(ref->array, size * sizeof ref->array[0]);
if (!temp) {
fprintf(stderr, "table_push_array(): Out of memory.\n");
exit(EXIT_FAILURE);
}
ref->array = temp;
for (i = ref->size; i < size; i++)
array_init(ref->array + i); /* array_init(&(ref->array[i])); */
ref->size = size;
}
ref->array[ref->used] = *one; /* "shallow copy" */
ref->used++;
}
The corresponding pop operation should be pretty obvious by now:
array *table_pop_array(table *ref)
{
array retval = ARRAY_INIT;
if (!ref) {
fprintf(stderr, "table_pop_array(): NULL table!\n");
exit(EXIT_FAILURE);
}
if (ref->used < 1) {
fprintf(stderr, "table_pop_array(): Table is empty, no arrays to pop!\n");
exit(EXIT_FAILURE);
}
/* Decrement the used count, so it refers to the last array in the table. */
ref->used--;
/* Shallow copy the array. */
retval = ref->array[ref->used];
/* Init, but do not free, the array in the table. */
array_init(ref->array + ref->used); /* array_init(&(ref->array[ref->used)); */
/* Return the array. */
return retval;
}
There is a "trick" in table_pop_array() above, that you should understand. While we cannot return pointers to local variables, we can return structures. In the above case, the structure describes the array, and the pointer in it does not refer to a local variable, but to a dynamically allocated array of items. Structure types can be assigned just as normal scalar types (like int or double); it is basically the same as if you assigned each member separately.
Overall, you should notice I have not used a single malloc() call. This is because realloc(NULL, size) is equivalent to malloc(size), and simply initializing unused pointers to NULL makes everything simpler.
When a table is grown (reallocated), we do need to initialize all the new arrays, because of the above realloc() use pattern.
The above approach does not preclude direct access to specific arrays in the table, or specific items in an array. If you intend to implement such functions, two helper functions similar to
void table_need_arrays(table *ref, const size_t size);
void array_need_items(array *ref, const size_t size);
that ensure that the table has room for at least size arrays, and an array has room for at least size items. They are also useful when pushing multiple items or arrays consecutively, as then one can do e.g. table_need_arrays(&mytable, mytable.used + 10); to ensure there is room for additional 10 arrays in the table.
All throughout the functions above, you can see notation name_of_array + index, and a comment with corresponding &(name_of_array[index]). This is because the two notations are equivalent: pointer to the index'th element in name_of_array.
I didn't bother to compile-check the above code, so there might be typos hidden in there. (This too is intentional, because I want you to understand the code, and not just copy it and use it as your own without understanding any of the details.) However, the logic is sound. So, if you find a typo or issue, let me know in a comment, and I shall fix.

C pthread accessing an array from multiple threads

I have a problem for accessing an array from several threads. I have written a struct which gathers all informations needed for the job I want to do.
The structure is defined like this:
struct thread_blk
{
size_t th_blk_count;
size_t th_blk_size;
size_t th_blk_current;
size_t* th_blk_i_start;
void* data;
pthread_t* th_id;
ThreadBlockAdaptive th_blk_adapt;
};
The idea is to fill an array from multiple threads, each one working on a delimited field of memory of an array.
The th_blk_count field represents the amount of block that has to
be treated,
The th_blk_size field represents the size of a block,
The th_blk_current field represents the processed block (they are
listed from 0 to n),
The th_blk_i_start is an array which contains indexes of the array
that has to be filled.
Just a single function applied to the thread_blk struct is not working properly:
int startAllThreadBlock(struct thread_blk* th_blk, worker_func f)
{
int res = 0;
for(size_t i = 0; i < th_blk->th_blk_count; ++i)
{
res |= pthread_create(th_blk->th_id + i, NULL, f, th_blk);
th_blk->th_blk_current++;
}
return res;
}
In fact, the th_blk_current field is not incremented properly. I used it to retrieve the th_blk_i_start indexes which serve as intervals. As a result, my worker (shown bellow) is processing the same indexes of the double array.
Here is the function I use in the startAllThreadBlock function:
void* parallel_for(void* th_blk_void)
{
ThreadBlock th_blk = (ThreadBlock)th_blk_void;
size_t i = getThreadBlockStartIndex(th_blk, getThreadBlockCurrentIndex(th_blk));
printf(
"Running thread %p\n"
" -Start index %zu\n\n",
pthread_self(),
i
);
if(getThreadBlockCurrentIndex(th_blk) == (getThreadBlockCount(th_blk) - 1))
{
for(; i < MAX; ++i)
{
result[i] = tan(atan((double)i));
}
}
else
{
size_t threshold = getThreadBlockStartIndex(th_blk, getThreadBlockCurrentIndex(th_blk) + 1);
for(; i < threshold; ++i)
{
result[i] = tan(atan((double)i));
}
}
return NULL;
}
ThreadBlock is just a typedef over a thread_blk*; result is the array of double.
I am pretty sure that the problem lies around the startAllThreadBlock (if I use a 1 second sleep everything run as expected). But I don't know how to fix it.
Does someone have an idea?
Thanks for your answers.
Update
Placing the incrementation in the worker solved the problem. But I think it is not safe though, for the reason Some programmer dude mentioned.
void* parallel_for(void* th_blk_void)
{
ThreadBlock th_blk = (ThreadBlock)th_blk_void;
size_t i = getThreadBlockStartIndex(th_blk, getThreadBlockCurrentIndex(th_blk));
size_t n;
if(getThreadBlockCurrentIndex(th_blk) == (getThreadBlockCount(th_blk) - 1))
{
n = MAX;
}
else
{
n = getThreadBlockStartIndex(th_blk, getThreadBlockCurrentIndex(th_blk) + 1);
}
incThreadBlockCurrent(th_blk);
printf(
"Running thread %p\n"
" -Start index %zu\n\n",
pthread_self(),
i
);
for(; i < n; ++i)
{
result[i] = tan(atan((double)i));
}
return NULL;
}
It would do it with a mutex on th_blk_current no?
I think the problem here is that you think the thread gets passed a copy of the structure. It doesn't, it gets a pointer. All the threads get a pointer to the same structure. So any changes to the structure will affect all threads.
You need to come up with a way to pass individual data to the individual threads. For example a thread-specific structure containing only the thread-specific data, and you dynamically allocate an instance of that structure to pass to the thread.

Dynamic array allocated, but cannot use it

I would love to use this code as a dynamic array. Unfortunately, I cannot figure out why the program does not use the allocated memory. Is there something wrong with the parameters of the AddToArray function, maybe?
It is possible to copy and paste this code directly into the IDE and compile, to take a look at the output. The memory seems to be allocated but not used?
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef struct {
float x;
float y;
} DATA;
int AddToArray (DATA item,
DATA **the_array,
int *num_elements,
int *num_allocated)
{
if(*num_elements == *num_allocated)
{ // Are more refs required?
// Feel free to change the initial number of refs
// and the rate at which refs are allocated.
if (*num_allocated == 0)
{
*num_allocated = 3; // Start off with 3 refs
}
else
{
*num_allocated *= 2;
}// Double the number of refs allocated
// Make the reallocation transactional by using a temporary variable first
void *_tmp = realloc(*the_array, (*num_allocated * sizeof(DATA)));
// If the reallocation didn't go so well, inform the user and bail out
if (!_tmp)
{
printf("ERROR: Couldn't realloc memory!\n");
return(-1);
}
// Things are looking good so far, so let's set the
*the_array = (DATA*)_tmp;
}
(*the_array)[*num_elements] = item;
*num_elements++;
return *num_elements;
}
int main()
{
DATA *the_array = NULL;
int num_elements = 0; // To keep track of the number of elements used
int num_allocated = 0; // This is essentially how large the array is
// Some data that we can play with
float numbers1[4] = {124.3,23423.4, 23.4, 5.3};
float numbers2[4] = { 42, 33, 15, 74 };
int i;
// Populate!
for (i = 0; i < 4; i++)
{
DATA temp;
temp.x = numbers1[i];
temp.y = numbers2[i];
if (AddToArray(temp, &the_array, &num_elements, &num_allocated) == -1)
return 1; // we'll want to bail out of the program.
}
for (i = 0; i < 4; i++)
{
printf("(x:%f,y:%f)\n", the_array[i].x, the_array[i].y);
}
printf("\n%d allocated, %d used\n", num_allocated, num_elements);
// Deallocate!
free(the_array);
// All done.
return 0;
}
In your code, you need to change
*num_elements++;
to
(*num_elements)++;
because, without the explicit parenthesis, the ++ is having higher precedence over * operator. What you want is to increment the value stored in the address, not the other way around.
Check about operator precedence here.
try
(*num_elements)++;
in function
AddToArray()
The bug in your code is that the pointer is incremented, not the value.
..
(*the_array)[*num_elements] = item;
(*num_elements)++;
..
Is this some kind of programming assignment? The code can be improved in many areas here.
There are lots of good algorithms for this kind of problem that are well written and optimized, I suggest that you some research in that area.

Pointer Conventions with: Array of pointers to certain elements

This question is about the best practices to handle this pointer problem I've dug myself into.
I have an array of structures that is dynamically generated in a function that reads a csv.
int init_from_csv(instance **instances,char *path) {
... open file, get line count
*instances = (instance*) malloc( (size_t) sizeof(instance) * line_count );
... parse and set values of all instances
return count_of_valid_instances_read;
}
// in main()
instance *instances;
int ins_len = init_from_csv(&instances, "some/path/file.csv");
Now, I have to perform functions on this raw data, split it, and perform the same functions again on the splits. This data set can be fairly large so I do not want to duplicate the instances, I just want an array of pointers to structs that are in the split.
instance **split = (instance**) malloc (sizeof(instance*) * split_len_max);
int split_function(instance *instances, ins_len, instances **split){
int i, c;
c = 0;
for (i = 0; i < ins_len; i++) {
if (some_criteria_is_true) {
split[c++] = &instances[i];
}
return c;
}
Now my question what would be the best practice or most readable way to perform a function on both the array of structs and the array of pointers? For a simple example count_data().
int count_data (intances **ins, ins_len, float crit) {
int i,c;
c = 0;
for (i = 0; i < ins_len; i++) {
if ins[i]->data > crit) {
++c;
}
}
return c;
}
// code smell-o-vision going off by now
int c1 = count_data (split, ins_len, 0.05); // works
int c2 = count_data (&instances, ins_len, 0.05); // obviously seg faults
I could make my init_from_csv malloc an array of pointers to instances, and then malloc my array of instances. I want to learn how a seasoned c programmer would handle this sort of thing though before I start changing a bunch of code.
This might seem a bit grungey, but if you really want to pass that instances** pointer around and want it to work for both the main data set and the splits, you really need to make an array of pointers for the main data set too. Here's one way you could do it...
size_t i, mem_reqd;
instance **list_seg, *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = (sizeof(instance*) + sizeof(instance)) * line_count;
list_seg = (instance**) malloc( mem_reqd );
data_seg = (instance*) &list_seg[line_count];
/* Index into the data segment */
for( i = 0; i < line_count; i++ ) {
list_seg[i] = &data_seg[i];
}
*instances = list_seg;
Now you can always operate on an array of instance* pointers, whether it's your main list or a split. I know you didn't want to use extra memory, but if your instance struct is not trivially small, then allocating an extra pointer for each instance to prevent confusing code duplication is a good idea.
When you're done with your main instance list, you can do this:
void free_instances( instance** instances )
{
free( instances );
}
I would be tempted to implement this as a struct:
struct instance_list {
instance ** data;
size_t length;
int owner;
};
That way, you can return this from your functions in a nicer way:
instance_list* alloc_list( size_t length, int owner )
{
size_t i, mem_reqd;
instance_list *list;
instance *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = sizeof(instance_list) + sizeof(instance*) * length;
if( owner ) mem_reqd += sizeof(instance) * length;
list = (instance_list*) malloc( mem_reqd );
list->data = (instance**) &list[1];
list->length = length;
list->owner = owner;
/* Index the list */
if( owner ) {
data_seg = (instance*) &list->data[line_count];
for( i = 0; i < line_count; i++ ) {
list->data[i] = &data_seg[i];
}
}
return list;
}
void free_list( instance_list * list )
{
free(list);
}
void erase_list( instance_list * list )
{
if( list->owner ) return;
memset((void*)list->data, 0, sizeof(instance*) * list->length);
}
Now, your function that loads from CSV doesn't have to focus on the details of creating this monster, so it can simply do the task it's supposed to do. You can now return lists from other functions, whether they contain the data or simply point into other lists.
instance_list* load_from_csv( char *path )
{
/* get line count... */
instance_list *list = alloc_list( line_count, 1 );
/* parse csv ... */
return list;
}
etc... Well, you get the idea. No guarantees this code will compile or work, but it should be close. I think it's important, whenever you're doing something with arrays that's even slightly more complicated than just a simple array, it's useful to make that tiny extra effort to encapsulate it. This is the major data structure you'll be working with for your analysis or whatever, so it makes sense to give it a little bit of stature in that it has its own data type.
I dunno, was that overkill? =)

Resources