Cannot free memory in a function - c

I'm writing a code in which I read some graphs from text file to process it later. I have one function to write graph to memory and second one which uses first one and then operates on this graph. The problem is that I allocate some memory in first function but I don't know where should I free it, because freeing it in first function crashed program, while in second function compiler says there is no such a struct.
struct Graph* createGraph(struct edge edges[], int wxk, int l)
{
// allocate memory for the graph data structure
//struct Graph* graph = (struct Graph*)malloc(sizeof(struct Graph));
struct Graph* graph = malloc(sizeof *graph);
graph->head = malloc( l * sizeof *(graph->head) );
// initialize head pointer for all vertices
for ( int i = 0; i < wxk; i++ ) {
graph->head[i] = NULL;
}
// add edges to the directed graph one by one
for ( int i = 0; i < l; i++ )
{
// get the source and destination vertex
int src = edges[i].src;
int dest = edges[i].dest;
double weight = edges[i].weight;
// allocate new node of adjacency list from `src` to `dest`
struct node* newNode = malloc(sizeof *(newNode) );
struct node* newNode2 = malloc( sizeof *(newNode2));
newNode->dest = dest;
newNode->weight = weight;
newNode->next = NULL;
if( graph->head[src] == NULL ) {
graph->head[src] = newNode;
} else {
for( newNode2 = graph->head[src]; newNode2->next != NULL; newNode2 = newNode2->next )
;
newNode2->next = newNode;
}
struct node* newNode3 = malloc( sizeof *(newNode3) );
struct node* newNode4 = malloc( sizeof *(newNode4) );
newNode3->dest = src;
newNode3->weight = weight;
newNode3->next = NULL;
if( graph->head[dest] == NULL ) {
graph->head[dest] = newNode3;
} else {
for( newNode4 = graph->head[dest]; newNode4->next != NULL; newNode4 = newNode4->next )
;
newNode4->next = newNode3;
}
}
return graph;
}
Here is first function code, in which I allocate memory to newNode, newNode2, newNode3 and newNode4. When I free this memory at end of this function, program crashes later.
void check_graph( char *plik)
{
FILE *in = fopen( plik, "r");
struct edge *edges = readfromfile(in);
int l = getl();
int wxk = getwxk();
struct Graph *graph = createGraph( edges, wxk, l);
struct FIFO queue;
short int *visited = malloc ( wxk * sizeof (int));
for( int i = 0; i < wxk; i++)
{
visited[i] = 0;
}
queue.vertices = (int *) malloc( wxk * sizeof(int) );
queue.front = 0;
queue.end = 0;
add_to_queue( &queue, 0);
visited[0] = 1;
while( queue.front != queue.end)
{
int current_vertex = del_from_queue( &queue);
struct node *tmp = graph->head[current_vertex];
while( tmp != NULL)
{
int adjVertex = tmp->dest;
if( visited[adjVertex] == 0)
{
visited[adjVertex] = 1;
add_to_queue( &queue, adjVertex);
}
tmp = tmp->next;
}
}
free(queue.vertices); // czyszczenie pamięci
free(visited);
free(edges);
for( int i = 0; i < wxk; i++ )
free( graph->head[i] );
free(graph->head);
free(graph);
}
If I try to free the previous memory here, compiler says that names of variables are undeclared

Short answer
Freeing memory should be handled in separate functions that destroy a specific object, one for (adjacency) lists and one for graphs (which calls the adjacency list destroying function). The adjacency list destructor should iterate over a list, freeing nodes as it visits them (note the nodes are freed using the destructor's own local variables, not the newNodeI variables in the graph constructor). The graph destructor would be called from check_graph. Note that this parallels how creation is handled in the code.
Longer answer
The program would greatly benefit from following some fundamental design principles. In particular, break up the functions into more basic operations, each of which performs a single task (akin to the Single Responsibility Principle from OOP). When considering the sub-tasks of a task, they should be at the same level and in the same domain (more on this later). Additionally, functions shouldn't be overlong. Repeated code is a candidate for abstraction into a separate function (Don't Repeat Yourself), as long as it is conceptually a single task. Though the program may not be explicitly object-oriented, some OO conventions can be usefully applied. Variable names should be descriptive.
Start thinking about function names. The sample has createGraph and check_graph, a mix of naming conventions. This isn't inherently wrong, but naming conventions should only be mixed when each convention is doing something different, and are in different parts of a program. One C convention for naming methods in an OO manner is to use DromedaryCase for class names and camelCase for method names (as is done in C++), and connect the two with an underscore (basically, snake case) (e.g. ClassName_methodName). Extending this, the underscore indicates going down in scope, so nested class methods would be named as: Outer_Inner_methodName. Alternatives include using camelCase for class names, or snake case for everything, or snake case but with a double underscore for scope (e.g. outer_class__inner_class__method_name). "Private" methods can be indicated with a leading underscore.
The check_graph function performs the following sub-tasks:
opens a file
causes edges to be read from file
causes a Graph object to be created
allocates space for a member field of a queue (queue.vertices)
traverses the graph breadth-first
examines queue members to determine when it's empty
destroys the queue member, edges, and Graph
This mixes different levels of tasks (e.g. causing a Graph object to be created (which happens in a different function) but destroying the object itself; creating a part of the queue) and domains (e.g. file I/O, memory management, and graph algorithms), resulting in multiple responsibilities. Reading objects from files should be handled by a component whose responsibility it is to bridge I/O and object creation. Destroying the graph object should be handled by a separate function, a counterpart to createGraph (or Graph_create, if you use the convention above). This in particular should resolve the issue in question. Queue manipulation should be farmed out to queue functions, encapsulating the operations and data.
The majority of the lines in check_graph are concerned with the breadth-first traversal of the graph. This could be the basis for a function that implements the BFS algorithm, taking a callback that's called for each vertex as it's visited. check_graph would then call the BFS function.
A sketch of a refactored version:
typedef void (*Graph_visitor_t)(Graph_t *graph, int iVertex, void *additional);
/**
* Breadth-first traversal of a graph
*
* visit: a callback, invoked for each vertex when visited
* pAdditional: additional data passed along to the `visit` function
*/
void Graph_bfs(Graph_t *graph, Graph_visitor_t visit, void *pAdditional) {
// TODO: detect & handle memory errors
bool *visited = calloc(sizeof(*visited), graph->nVertices);
IntQueue_t *queue = IntQueue_create(graph->nVertices);
visit(graph, 0, pAdditional);
visited[0] = 1;
IntQueue_push(queue, 0);
while (! IntQueue_empty(queue)) {
int current_vertex = IntQueue_pop(queue);
/* much the same as the original `check_graph` (only
* add a call to `visit`)
*/
// ...
}
IntQueue_destroy(queue);
free(visited);
}
void _Graph_countVisited(Graph_t* graph, int iVertex, int *pnVisited) {
++(*pnVisited);
}
// Demonstrates how to use Graph_bfs (check_graph woudl be similar).
void Graph_isConnected(Graph_t *graph) {
int nVisited = 0;
Graph_bfs(graph, &_Graph_countVisited, &nVisited);
return nVisited == graph->nVertices;
}
createGraph performs the following sub-tasks:
allocates the graph object & members
allocates the adjacency list nodes
traverses adjacency lists
adds nodes to adjacency lists
Again, some of these tasks are at different levels and should be farmed out (e.g. adjacency list manipulation). The code that manipulates the adjacency list within the loop is also repetitive, and is a great candidate for being moved to another function.
Many of the variable names (e.g. l, wxk, newNode3) aren't very descriptive, leading to some bugs. For example, in createGraph, graph->head is allocated to hold l entries, but wxk entries are accessed when initializing it (in this case, the better fix is to use calloc instead of manually initializing all entries to NULL). If these variables were name more descriptively, e.g. nVertices and nEdges (I'm guessing as to purpose), the bug would be more obvious and likely wouldn't have occurred in the first place.
void _Graph_addAdjacency(Graph_t *graph, int from, int to, double weight) {
Node_t *newNode = List_Node_create(to, weight);
if (graph->head[from] == NULL ) {
graph->head[from] = newSrcNode;
} else {
List_append(graph->head[from], newSrcNode);
}
}
void _Graph_addEdge(Graph_t *graph, Edge_t *edge) {
_Graph_addAdjacency(graph, edges[i].src, edges[i].dest, edges[i].weight);
_Graph_addAdjacency(graph, edges[i].dest, edges[i].src, edges[i].weight);
}
Graph_t* Graph_create(Edge_t edges[], int nEdges, int nVertices) {
//// allocation
// TODO: detect & handle memory errors
Graph_t *graph = malloc(sizeof *graph);
graph->head = calloc(sizeof *(graph->head), nVertices);
graph->nVertices = nVertices;
//// initialization
// add edges to the directed graph one by one
for (int i = 0; i < nEdges; i++) {
// TODO: add error detection
_Graph_addEdge(graph, edges[i]);
}
return graph;
}
Rounding out the example are functions to read the graph from the file (Graph_readFromPath) and to tie it all together (main in this example, though in a larger program it wouldn't be the main function).
Graph_t* Graph_readFromPath(const char *fName) {
FILE *in = fopen(fName, "r");
int nVertices = Count_readFromFile(in);
int nEdges = Count_readFromFile(in);
Edge_t *edges = Edges_readFromFile(in, nEdges);
fclose(in);
Graph_t* graph = Graph_create(Edge_t edges[], nEdges, nVertices);
free(edges);
return graph;
}
int main(int argc, char **argv) {
if (argc < 2) {
fprintf(stderr, "No input file given.");
return 1;
}
const char *fName = argv[1];
if (access(fname, R_OK)) {
fprintf(stderr, "Error reading input file '%s': %s", fName, strerror(errno));
return 1;
}
Graph_t *graph = Graph_readFromFile(fName);
if (! Graph_isConnected(graph)) {
// ...
}
Graph_destroy(graph);
return 0;
}

Related

Accessing structure within structure times 3?

I have an assignment in C and I have trouble accessing different members within my structs(some levels deep). I understand the basic principles, but I kinda lose it somewhere. I have 3 structures, with the top one containing an array of the second, which in turn contains an array of the third. My current issue is using malloc the correct way. Here is some of my code. I would appreciate any kind of information or tip, because i still have a long way to go and as you can see the structures are kinda complicated.
.h file
typedef struct user {
char* userID;
int wallet;
bitCoinList userBC; //Also a list
senderTransList userSendList; //Yes it has lists too..
receiverTransList userReceiveList;
}user;
typedef struct bucket {
struct bucket* next;
user** users;
}bucket;
typedef struct hashtable {
unsigned int hashSize;
unsigned int bucketSize;
bucket** buckets;
}hashtable;
Here is my function for creating and initializing the hashtable..I get the error when I try to access users with HT->buckets->users (request for member users in something not a structure or a union)
.c file
// Creation and Initialization of HashTable
hashtable* createInit(unsigned int HTSize,unsigned int buckSize){
hashtable* HT = (hashtable*)malloc(sizeof(hashtable));
if(HT==NULL) {
printf("Error in hashtable memory allocation... \n");
return NULL;
}
HT->hashSize=HTSize;
HT->bucketSize=buckSize;
HT->buckets = malloc(HTSize * sizeof(HT->buckets));
if(HT->buckets==NULL) {
printf("Error in Buckets memory allocation... \n");
return NULL;
}
HT->buckets->users = malloc(buckSize * sizeof(HT->buckets->users));
if(HT->buckets->users==NULL) {
printf("Error in Users memory allocation... \n");
return NULL;
}
for(int i=0; i <HTSize; i++){
HT->buckets[i] = malloc(sizeof(bucket));
HT->buckets[i]->next = NULL;
if(HT->buckets[i]==NULL) {
printf("Error in Bucket %d memory allocation... \n",i);
return NULL;
}
for(int j=0; j <buckSize; j++){
HT->buckets[i]->users[j] = malloc(sizeof(user));
if(HT->buckets[i]==NULL) {
printf("Error in User %d memory allocation... \n",i);
return NULL;
}
}
}
return HT;
}
Because buckets is pointer to pointer type you need to:
(*(HT-> buckets)) ->users = ....
or
HT-> buckets[0] ->users = .... // or any other index depending of the program logic
or (for the n-th pointer)
(*(HT-> buckets + n)) ->users = ....
or
HT-> buckets[n] ->users = .... // or any other index depending of the program logic
This only the syntax answer and I do not analyze the program logic
At least one problem: wrong size allocation.
Allocate to the size of the data pointed to by HT->buckets, not to the size of the pointer.
Avoid mistakes. The below idiom is easy to code to, review and maintain.
ptr = malloc(sizeof *ptr * n);
// HT->buckets = malloc(HTSize * sizeof(HT->buckets));
HT->buckets = malloc(HTSize * sizeof *(HT->buckets));
// HT->buckets->users = malloc(buckSize * sizeof(HT->buckets->users));
HT->buckets->users = malloc(buckSize * sizeof *(HT->buckets->users));
// HT->buckets[i] = malloc(sizeof(bucket));
HT->buckets[i] = malloc(sizeof *(HT->buckets[i]));
// HT->buckets[i]->users[j] = malloc(sizeof(user));
HT->buckets[i]->users[j] = malloc(sizeof *(HT->buckets[i]->users[j]));

Returning an array of structs from a recursive huffman tree C

i have a task in class to the return an array of struck Symbol from huffman tree.
the function getSL get a huffman tree(only) and return struck of Symbol.
each spot in the array contain a char from the "leaf" of the tree and the
length of his code(how many cross section till the leaf).
my main problem was to find how i advance the cnt of the arry that it will not overright the arry.
thank you.
typedef struct HNode {
char chr;
struct HNode *left, *right;
} HNode;
typedef struct {
char chr;
int counter;
}Symbol;
this is what i did till now.
Symbol * getSL(HNode *root) {
if (root->left == NULL && root->right == NULL) {
Symbol* b = (Symbol*)malloc(100);
b->counter=0;
b->chr = root->chr;
return b;
}
Symbol* a = (Symbol*)malloc(100);
if (root->left != NULL) {
a= getSL(root->left);
a->counter++;
}
if (root->right != NULL) {
a= getSL(root->right);
a->counter++;
}
return a;
}
Apart from the malloc problem (see the comments already), you have a fundamental problem: You allocate a new struct, but then replace it with the one returned from the recursive call. So you lose the one created before (actually, memory leaking!).
Easiest variant would now be converting your Symbol to linked list nodes; then you simply could do:
Symbol* lastLeafFound; // probaly a function parameter!
if(!(root->left || root->right))
{
// leaf found:
Symbol* a = (Symbol*)malloc(sizeof(Symbol));
a->chr = root->chr;
a->counter = /* ... */;
a->next = NULL;
lastLeafFound->next = a;
// you might return a now as last leaf found, using it in the next recursive call
}
Sure, above code is incomplete, but should give you the idea...
If you cannot modify your struct, then you need to create an array and pass it on to every new recursive call (prefer not to use global variables instead):
void doGetSL
(
HNode* root,
Symbol** symbols, // your array to be used
unsigned int* count, // number of symbols contained so far
unsigned int* capacity // maximum possible symbols
)
Passing all data as pointers allows the function to modify them as needed and they are still available from outside...
Symbol* getSL(HNode* root)
{
if(!root)
return NULL;
unsigned int count = 0;
unsigned int capacity = 128;
// allocate a whole array:
Symbol* array = malloc(capacity*sizeof(Symbol));
if(array) // malloc could fail...
{
doGetSL(root, &array, &count, &capacity);
// as you cannot return the number of leaves together with
// the array itself, you will need a sentinel:
array[count].chr = 0;
// obvious enough, I'd say, alternatively you could
// set counter to 0 or -1 (or set both chr and counter)
}
return array;
}
doGetSL will now use above set up "infrastructure":
{
if(!(root->left || root->right))
{
if(*count == *capacity)
{
// no memory left -> we need a larger array!
// store in separate variables:
unsigned int c = *capacity * 2;
Symbol* s = realloc(symbols, c * sizeof(Symbol));
// now we can check, if reallocation was successful
// (on failure, s will be NULL!!!):
if(s)
{
// OK, we can use them...
*symbols = s; // <- need a pointer for (pointer to pointer)!
*capacity = c;
}
else
{
// re-allocation failed!
// -> need appropriate error handling!
}
}
(*symbols)[count].chr = root->chr;
(*symbols)[count].counter = /*...*/;
++*count;
}
else
{
if(root->left)
{
doGetSL(root->left, symbols, count, capacity);
}
if(root->right)
{
doGetSL(root->right, symbols, count, capacity);
}
}
}
One thing yet omitted: setting the counter. That would be quite easy: add another parameter to doGetSL indicating the current depth, which you increment right when entering doGetSL, you can then just assign this value when needed.
You can further improve above variant (especially readability), if you introduce a new struct:
struct SLData
{
Symbol* symbols, // your array to be used
unsigned int count, // number of symbols contained so far
unsigned int capacity // maximum possible symbols
};
and pass this one instead of the three pointers:
doGetSL(HNode*, struct SLData*, unsigned int depth);
struct SLData data =
{
.count = 0;
.capacity = 128;
.array = malloc(capacity*sizeof(Symbol));
};
if(data.array)
doGetSL(root, &data, 0); // again passed as pointer!

graph implementation with adjacency lists in C

I just started learning C and as a self-learning excercise, I am implementing data structures and algos in C. Right now I am working on a graph and this is the data structure representation of it.
typedef int graphElementT;
typedef struct graphCDT *graphADT;
typedef struct vertexTag
{
graphElementT element;
int visited;
struct edgeTag *edges;
struct vertexTag *next;
} vertexT;
typedef struct edgeTag
{
int weight;
vertexT *connectsTo;
struct edgeTag *next;
} edgeT;
typedef struct graphCDT
{
vertexT *vertices;
} graphCDT;
To this graph I added a addVertex function.
int addVertex(graphADT graph, graphElementT value)
{
vertexT *new = malloc(sizeof(*new));
vertexT *vert;
new->element = value;
new->visited = 0;
new->edges = NULL;
new->next = NULL;
int i = 0;
for(vert=graph->vertices; vert->next != NULL; vert=vert->next)
{
if(vert->element == value)
{
printf("already exists\n");
return 0;
}
}
vert->next = new;
//free(new);
printf("\ninserted %d\n", vert->element);
return 1;
}
This works fine except for three things.
if the newly added vertex is the same as the last vertex in the list, it fails to see it. To prevent this i changed the for loop limiting condition to vert != NULL, but that gives a seg fault.
if i try to free the temporarily allocated pointer, it resets the memory pointer by the pointer and this adds an infinite loop at the end of the vertex list. Is there no way to free the pointer without writing over the memory it points to? Or is it not really needed to free the pointer?
Also would destroying the graph mean destroying every edge and vertices? or is there a better approach?
Also if this data structure for graph is not a good one and there are better implementations, i would appreciate that being pointed out.
1
If you change the limiting condition to vert!=NULL , and if the loop ends with vert==NULL ,i.e. ,the vertex to be added isn't present , then you will be reading next statement :
vert->next = new;
That means you are accesing the NULL ,vert pointer , hence the seg fault .
Now to allow checking if the last element isn't the vertex to be added ,and also to prevent seg fault ,do this :
for(vert=graph->vertices; vert->next != NULL; vert=vert->next)
{
if(vert->element == value)
{
printf("already exists\n");
return 0;
}
}
if(vert->element == value)
{
printf("already exists\n");
return 0;
}
vert->next = new;
2
The temporary "new" pointer is the memory location allocated to the Vertex you added .IT IS NOT to be freed ,as freeing it will mean that you deleted the vertex you just added :O .
3
Yes , detroying the graph essentialy means the same .
It is always a good practice to implement linked list as a adjacency list implementation of graph .Although you can always use a c++ "2 D Vector" to implement the same .
Here's a working addVertex function that you can use.
I am keeping the original declarations as it is.
I have added a main () to which you can give command line arguments to test.
int addVertex(graphADT graph, graphElementT value)
{
vertexT *tmpvert , *vert ;
vert=graph->vertices ;
/*check to see whether we really need to create a new vertex*/
tmpvert = vert;
while(tmpvert != NULL)
{
/* U can put a debug printf here to check what's there in graph:
* printf("tmpvert->elem=%d ", tmpvert->element);
*/
vert = tmpvert;
if(tmpvert->element == value)
return 0;
tmpvert=tmpvert->next ;
}
/*If we are here , then we HAVE to allocate memory and add to our graph.*/
tmpvert = (vertexT*)malloc(sizeof(vertexT));
if ( NULL == tmpvert )
return 0; /* malloc failure */
tmpvert->element = value;
tmpvert->visited = 0;
tmpvert->edges = NULL;
tmpvert->next = NULL;
if ( NULL == vert )
graph->vertices = tmpvert; /*Notice that I dont use virt=tmpvert */
else
vert->next = tmpvert; /*putting stuff in next is fine */
return 1;
/* Dont try printing vert->element here ..vert will be NULL first time */
/*return code for success is normally 0 others are error.
*That way you can have your printfs and error code
*handling outside this function.But its ok for a test code here */
}
Now for the main () snippet for testing :
int main (int argc , char* argv[]) {
graphADT graph ;
graph =(graphADT) malloc ( sizeof(struct graphCDT) );
graph->vertices = NULL;
while ( --argc >0)
{
int value = atoi(argv[argc]);
addVertex(graph,value);
}
}

C Input for Directed Graphs

We are told our input file would be a simple list of numbers:
1 3 4
2 3
3 4
4 1 2
Where the first number is the source node, and the proceeding numbers are it's adjacent nodes.
I am trying to figure out how to best store this.
I wanted to firstly initialize a "graph", an array that contains all these nodes.
Then upon reading the file, line by line, I would store the root node into the graph array, and then update the node's outlist (adjacent nodes) with the following numbers until we reach the end of the line, repeating this for each line until EOF.
However I'm struggling on how to initialize the graph, do I just assume a certain size and realloc() once the size is hit? Do I read the file first and count the number of lines to find out the size, then re-read the file to store the nodes? Is there any other way?
Here is the code for my data structures:
int initialize (Graph *mygraph, int MaxSize) {
mygraph->MaxSize = MaxSize;
mygraph->table = (Node *)malloc(sizeof(Node) * MaxSize);
return 0;
}
int insert_node (Graph *mygraph, int n, char *name) {
mygraph->table[n].name = strdup(name);
mygraph->table[n].outdegree = 0;
return 0;
}
int insert_link (Graph *mygraph, int source, int target) {
List *newList = (List *)malloc(sizeof(List));
newList->index = target;
newList->next = mygraph->table[source].outlist;
mygraph->table[source].outlist = newList;
return 0;
}
So upon reading the file,
I initialize the graph.
I read the first number, store it as a new graph node.
I read the next numbers until hitting "\n", and store these as graph links to the above root node.
I do this for each line until hitting EOF.
As you can see I have no idea what the "MaxSize" until the whole file is read.
Thanks!
I'm rather new to C so sorry if I've done anything silly.
You could have some initial guess for MaxSize (e.g. 8) and grow when needed your data (perhaps by graph->MaxSize += graph->MaxSize/2) using realloc, or just by malloc-ing a bigger new chunk, copying the older chunk inside, then free-ing that older chunk). Don't forget to check the successful result of any malloc or calloc or realloc call, they could (rarely) fail.
Notice that I have no idea of how your Graph and Node type is declared (just guessing).
I am assuming and guessing you have declared something like
typedef struct node_st Node;
typedef struct graph_st Graph;
struct node_st {
char*name; // strdup-ed
unsigned outdegree;
};
struct graph_st {
unsigned MaxSize;
Node* table; //calloc-ed, of allocated size MaxSize
};
So for example your insert_node function might be
void insert_node (Graph *mygraph, int n, char *name) {
assert (mygraph != NULL);
assert (n >= 0);
assert (name != NULL && *name != (char)0);
unsigned maxsize = mygraph->MaxSize;
if (maxsize <= n) {
unsigned newmaxsize = n + maxsize/2 + 1;
Node* newtable = calloc (newmaxsize, sizeof(Node));
if (!newtable)
perror("growing table in graph"), exit(EXIT_FAILURE);
for (unsigned i=0; i<maxsize; i++)
newtable[i] = mygraph->table[i];
free (mygraph->table);
mygraph->table = newtable;
mygraph->MaxSize = newmaxsize;
};
mygraph->table[n].name = strdup(name);
mygraph->table[n].outdegree = 0;
}
You probably don't need insert_node to return a value (otherwise you won't always return 0). So I made it a void returning function (i.e. a "procedure" or "routine").

Pointer Conventions with: Array of pointers to certain elements

This question is about the best practices to handle this pointer problem I've dug myself into.
I have an array of structures that is dynamically generated in a function that reads a csv.
int init_from_csv(instance **instances,char *path) {
... open file, get line count
*instances = (instance*) malloc( (size_t) sizeof(instance) * line_count );
... parse and set values of all instances
return count_of_valid_instances_read;
}
// in main()
instance *instances;
int ins_len = init_from_csv(&instances, "some/path/file.csv");
Now, I have to perform functions on this raw data, split it, and perform the same functions again on the splits. This data set can be fairly large so I do not want to duplicate the instances, I just want an array of pointers to structs that are in the split.
instance **split = (instance**) malloc (sizeof(instance*) * split_len_max);
int split_function(instance *instances, ins_len, instances **split){
int i, c;
c = 0;
for (i = 0; i < ins_len; i++) {
if (some_criteria_is_true) {
split[c++] = &instances[i];
}
return c;
}
Now my question what would be the best practice or most readable way to perform a function on both the array of structs and the array of pointers? For a simple example count_data().
int count_data (intances **ins, ins_len, float crit) {
int i,c;
c = 0;
for (i = 0; i < ins_len; i++) {
if ins[i]->data > crit) {
++c;
}
}
return c;
}
// code smell-o-vision going off by now
int c1 = count_data (split, ins_len, 0.05); // works
int c2 = count_data (&instances, ins_len, 0.05); // obviously seg faults
I could make my init_from_csv malloc an array of pointers to instances, and then malloc my array of instances. I want to learn how a seasoned c programmer would handle this sort of thing though before I start changing a bunch of code.
This might seem a bit grungey, but if you really want to pass that instances** pointer around and want it to work for both the main data set and the splits, you really need to make an array of pointers for the main data set too. Here's one way you could do it...
size_t i, mem_reqd;
instance **list_seg, *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = (sizeof(instance*) + sizeof(instance)) * line_count;
list_seg = (instance**) malloc( mem_reqd );
data_seg = (instance*) &list_seg[line_count];
/* Index into the data segment */
for( i = 0; i < line_count; i++ ) {
list_seg[i] = &data_seg[i];
}
*instances = list_seg;
Now you can always operate on an array of instance* pointers, whether it's your main list or a split. I know you didn't want to use extra memory, but if your instance struct is not trivially small, then allocating an extra pointer for each instance to prevent confusing code duplication is a good idea.
When you're done with your main instance list, you can do this:
void free_instances( instance** instances )
{
free( instances );
}
I would be tempted to implement this as a struct:
struct instance_list {
instance ** data;
size_t length;
int owner;
};
That way, you can return this from your functions in a nicer way:
instance_list* alloc_list( size_t length, int owner )
{
size_t i, mem_reqd;
instance_list *list;
instance *data_seg;
/* Allocate list and data segments in one large block */
mem_reqd = sizeof(instance_list) + sizeof(instance*) * length;
if( owner ) mem_reqd += sizeof(instance) * length;
list = (instance_list*) malloc( mem_reqd );
list->data = (instance**) &list[1];
list->length = length;
list->owner = owner;
/* Index the list */
if( owner ) {
data_seg = (instance*) &list->data[line_count];
for( i = 0; i < line_count; i++ ) {
list->data[i] = &data_seg[i];
}
}
return list;
}
void free_list( instance_list * list )
{
free(list);
}
void erase_list( instance_list * list )
{
if( list->owner ) return;
memset((void*)list->data, 0, sizeof(instance*) * list->length);
}
Now, your function that loads from CSV doesn't have to focus on the details of creating this monster, so it can simply do the task it's supposed to do. You can now return lists from other functions, whether they contain the data or simply point into other lists.
instance_list* load_from_csv( char *path )
{
/* get line count... */
instance_list *list = alloc_list( line_count, 1 );
/* parse csv ... */
return list;
}
etc... Well, you get the idea. No guarantees this code will compile or work, but it should be close. I think it's important, whenever you're doing something with arrays that's even slightly more complicated than just a simple array, it's useful to make that tiny extra effort to encapsulate it. This is the major data structure you'll be working with for your analysis or whatever, so it makes sense to give it a little bit of stature in that it has its own data type.
I dunno, was that overkill? =)

Resources