Recursive struct and malloc() - c

I have a recursive struct which is:
typedef struct dict dict;
struct dict {
dict *children[M];
list *words[M];
};
Initialized this way:
dict *d = malloc(sizeof(dict));
bzero(d, sizeof(dict));
I would like to know what bzero() exactly does here, and how can I malloc() recursively for children.
Edit: This is how I would like to be able to malloc() the children and words:
void dict_insert(dict *d, char *signature, unsigned int current_letter, char *w) {
int occur;
occur = (int) signature[current_letter];
if (current_letter == LAST_LETTER) {
printf("word found : %s!\n",w);
list_print(d->words[occur]);
char *new;
new = malloc(strlen(w) + 1);
strcpy(new, w);
list_append(d->words[occur],new);
list_print(d->words[occur]);
}
else {
d = d->children[occur];
dict_insert(d,signature,current_letter+1,w);
}
}

bzero(3) initializes the memory to zero. It's equivalent to calling memset(3) with a second parameter of 0. In this case, it initializes all of the member variables to null pointers. bzero is considered deprecated, so you should replace uses of it with memset; alternatively, you can just call calloc(3) instead of malloc, which automatically zeroes out the returned memory for you upon success.
You should not use either of the two casts you have written—in C, a void* pointer can be implicitly cast to any other pointer type, and any pointer type can be implicitly cast to void*. malloc returns a void*, so you can just assign it to your dict *d variable without a cast. Similarly, the first parameter of bzero is a void*, so you can just pass it your d variable directly without a cast.
To understand recursion, you must first understand recursion. Make sure you have an appropriate base case if you want to avoid allocating memory infinitely.

In general, when you are unsure what the compiler is generating for you, it is a good idea to use a printf to report the size of the struct. In this case, the size of dict should be 2 * M * the size of a pointer. In this case, bzero will fill a dict with zeros. In other words, all M elements of the children and words arrays will be zero.
To initialize the structure, I recommend creating a function that takes a pointer to a dict and mallocs each child and then calls itself to initialize it:
void init_dict(dict* d)
{
int i;
for (i = 0; i < M; i++)
{
d->children[i] = malloc(sizeof(dict));
init_dict(d->children[i]);
/* initialize the words elements, too */
}
}
+1 to you if you can see why this code won't work as is. (Hint: it has an infinite recursion bug and needs a rule that tells it how deep the children tree needs to be so it can stop recursing.)

bzero just zeros the memory. bzero(addr, size) is essentially equivalent to memset(addr, 0, size). As to why you'd use it, from what I've seen around half the time it's used, it's just because somebody though zeroing the memory seemed like a good idea, even though it didn't really accomplish anything. In this case, it looks like the effect would be to set some pointers to NULL (though it's not entirely portable for that purpose).
To allocate recursively, you'd basically just keep track of a current depth, and allocate child nodes until you reached the desired depth. Code something on this order would do the job:
void alloc_tree(dict **root, size_t depth) {
int i;
if (depth == 0) {
(*root) = NULL;
return;
}
(*root) = malloc(sizeof(**root));
for (i=0; i<M; i++)
alloc_tree((*root)->children+i, depth-1);
}
I should add that I can't quite imagine doing recursive allocation like this though. In a typical case, you insert data, and allocate new nodes as needed to hold the data. The exact details of that will vary depending on whether (and if so how) you're keeping the tree balanced. For a multi-way tree like this, it's fairly common to use some B-tree variant, in which case the code I've given above won't normally apply at all -- with a B-tree, you fill a node, and when it's reached its limit, you split it in half and promote the middle item to the parent node. You allocate a new node when this reaches the top of the tree, and the root node is already full.

Related

How can I make a pool with pointers in C?

I'm making my library, and just when I thought understanding the pointers syntax, I just get confused, search on the web and get even more confused.
Basically I want to make a pool, here is what I actually want to do:
the following points must be respected :
when I add an object to the pool, the pointers of the current array to the objects are
added to a new array of pointers + 1 (to contain the new object).
the new array is pointed by "objects" of my foo structure.
the old array is free'ing.
when I call the cleanup function, all the object in the pool are
free'd
How should I define my structure ?
typedef struct {
int n;
(???)objects
} foo;
foo *the_pool;
here's the code to manage my pool :
void myc_pool_init ()
{
the_pool = (???)malloc(sizeof(???));
the_pool->n = 0;
the_pool->objects = NULL;
}
void myc_push_in_pool (void* object)
{
if (object != NULL) {
int i;
(???)new_pointers;
the_pool->n++;
new_pointers = (???)malloc(sizeof(???)*the_pool->n);
for (i = 0; i < the_pool->n - 1; ++i) {
new_pointers[i] = (the_pool->objects)[i]; // that doesn't work (as I'm not sure how to handle it)
}
new_array[i] = object;
free(the_pool->objects);
the_pool->objects = new_array; // that must be wrong
}
}
void myc_pool_cleanup ()
{
int i;
for (i = 0; i < the_pool->n; ++i)
free((the_pool->objects)[i]); // as in myc_push_in_pool, it doesn't work
free(the_pool->objects);
free(the_pool);
}
Note: the type of objects added to the pool are not known in advance, so i should handles all pointers as void
any feedback would be very welcomed.
A straight answer to your question would be: use void *. This type is very powerful as it allows you to put any kind of pointer in your pool. However, it's up to you to do the correct casts when retrieving a void * pointer from your pool.
Your struct would look like this
typedef struct {
int n;
(void **)objects
} foo;
foo *the_pool;
As in, an array of pointers.
Your malloc:
new_pointers = (void **)malloc(sizeof(void *)*the_pool->n);
There is an performance issue here. You could simply allocate an array of a fixed size, and only reallocate if the number of elements exceeds a predefined load factor (= number used/ max size)
Also, instead of allocating a new pointer each time you add something to your pool, you could just use realloc (http://www.cplusplus.com/reference/cstdlib/realloc/)
the_pool->objects = (void **)realloc(the_pool->objects, the_pool->n* sizeof(void*));
Realloc tries to increase the current allocated area, without the need to copy everything. Only if the function cannot increase the allocated area contiguously will it allocate a new area and copy everything.
Firstly, you already answered your "What should the type of foo.objects be?" question: void *objects;, malloc already returns void *. Your struct needs to store the size_t item_size;, too. n should probably also be a size_t.
typedef struct {
size_t item_count;
size_t item_size;
void *objects;
} foo;
foo *the_pool;
You could use a home-grown loop, but I'd consider memcpy to be a more convenient way to copy your old items to your new space, and the new item to it's new space.
Dereferencing a void * is a constraint violation, as is pointer arithmetic on a void *, so new_pointers will need to be a different type. You need a type that points to objects of the right size. You could use an array of the right number of unsigned char, like so:
// new_pointers is a pointer to array of the_pool->item_size unsigned chars.
unsigned char (*new_pointers)[the_pool->item_size] = malloc(the_pool->item_count * sizeof *new_pointers);
// copy the old items
memcpy(new_pointers, the_pool->objects, the_pool->item_count * sizeof *new_pointers);
// copy the new items
memcpy(new_pointers + the_pool->item_count, object, sizeof *new_pointers);
Remember, free() is only for pointers returned by malloc(), and there should be a one-to-one correspondence: Each malloc() should be free()d. Look how you malloc: new_pointers = malloc(sizeof(???)*the_pool->n); ... What makes you think you need a loop (in myc_pool_cleanup) to free each item, when you can free them all in one foul swoop?
You could use realloc, but you otherwise seem to be handling malloc/memcpy/free *in myc_push_in_pool* flawlessly. Lots of people tend to mess up when writing realloc code.

Dynamically allocate array of file pointers

is it possible to 'dynamically' allocate file pointers in C?
What I mean is this :
FILE **fptr;
fptr = (FILE **)calloc(n, sizeof(FILE*));
where n is an integer value.
I need an array of pointer values, but I don't know how many before I get a user-input, so I can't hard-code it in.
Any help would be wonderful!
You're trying to implement what's sometimes called a flexible array (or flex array), that is, an array that changes size dynamically over the life of the program.) Such an entity doesn't exist among in C's native type system, so you have to implement it yourself. In the following, I'll assume that T is the type of element in the array, since the idea doesn't have anything to do with any specific type of content. (In your case, T is FILE *.)
More or less, you want a struct that looks like this:
struct flexarray {
T *array;
int size;
}
and a family of functions to initialize and manipulate this structure. First, let's look at the basic accessors:
T fa_get(struct flexarray *fa, int i) { return fa->array[i]; }
void fa_set(struct flexarray *fa, int i, T p) { fa->array[i] = p; }
int fa_size(struct flexarray *fa) { return fa->size; }
Note that in the interests of brevity these functions don't do any error checking. In real life, you should add bounds-checking to fa_get and fa_set. These functions assume that the flexarray is already initialized, but don't show how to do that:
void fa_init(struct flexarray *fa) {
fa->array = NULL;
fa->size = 0;
}
Note that this starts out the flexarray as empty. It's common to make such an initializer create an array of a fixed minimum size, but starting at size zero makes sure you exercise your array growth code (shown below) and costs almost nothing in most practical circumstances.
And finally, how do you make a flexarray bigger? It's actually very simple:
void fa_grow(struct flexarray *fa) {
int newsize = (fa->size + 1) * 2;
T *newarray = malloc(newsize * sizeof(T));
if (!newarray) {
// handle error
return;
}
memcpy(newaray, fa->array, fa->size * sizeof(T));
free(fa->array);
fa->array = newarray;
fa->size = newsize;
}
Note that the new elements in the flexarray are uninitialized, so you should arrange to store something to each new index i before fetching from it.
Growing flexarrays by some constant multiplier each time is generally speaking a good idea. If instead you increase it's size by a constant increment, you spend quadratic time copying elements of the array around.
I haven't showed the code to shrink an array, but it's very similar to the growth code,
Any way it's just pointers so you can allocate memory for them
but don't forget to fclose() each file pointer and then free() the memory

C using malloc and duplicating array

I am supposed to follow the following criteria:
Implement function answer4 (pointer parameter and n):
Prepare an array of student_record using malloc() of n items.
Duplicate the student record from the parameter to the array n
times.
Return the array.
And I came with the code below, but it's obviously not correct. What's the correct way to implement this?
student_record *answer4(student_record* p, unsigned int n)
{
int i;
student_record* q = malloc(sizeof(student_record)*n);
for(i = 0; i < n ; i++){
q[i] = p[i];
}
free(q);
return q;
};
p = malloc(sizeof(student_record)*n);
This is problematic: you're overwriting the p input argument, so you can't reference the data you were handed after that line.
Which means that your inner loop reads initialized data.
This:
return a;
is problematic too - it would return a pointer to a local variable, and that's not good - that pointer becomes invalid as soon as the function returns.
What you need is something like:
student_record* ret = malloc(...);
for (int i=...) {
// copy p[i] to ret[i]
}
return ret;
1) You reassigned p, the array you were suppose to copy, by calling malloc().
2) You can't return the address of a local stack variable (a). Change a to a pointer, malloc it to the size of p, and copy p into. Malloc'd memory is heap memory, and so you can return such an address.
a[] is a local automatic array. Once you return from the function, it is erased from memory, so the calling function can't use the array you returned.
What you probably wanted to do is to malloc a new array (ie, not p), into which you should assign the duplicates and return its values w/o freeing the malloced memory.
Try to use better names, it might help in avoiding the obvious mix-up errors you have in your code.
For instance, start the function with:
student_record * answer4(const student_record *template, size_t n)
{
...
}
It also makes the code clearer. Note that I added const to make it clearer that the first argument is input-only, and made the type of the second one size_t which is good when dealing with "counts" and sizes of things.
The code in this question is evolving quite quickly but at the time of this answer it contains these two lines:
free(q);
return q;
This is guaranteed to be wrong - after the call to free its argument points to invalid memory and anything could happen subsequently upon using the value of q. i.e. you're returning an invalid pointer. Since you're returning q, don't free it yet! It becomes a "caller-owned" variable and it becomes the caller's responsibility to free it.
student_record* answer4(student_record* p, unsigned int n)
{
uint8_t *data, *pos;
size_t size = sizeof(student_record);
data = malloc(size*n);
pos = data;
for(unsigned int i = 0; i < n ; i++, pos=&pos[size])
memcpy(pos,p,size);
return (student_record *)data;
};
You may do like this.
This compiles and, I think, does what you want:
student_record *answer4(const student_record *const p, const unsigned int n)
{
unsigned int i;
student_record *const a = malloc(sizeof(student_record)*n);
for(i = 0; i < n; ++i)
{
a[i] = p[i];
}
return a;
};
Several points:
The existing array is identified as p. You want to copy from it. You probably do not want to free it (to free it is probably the caller's job).
The new array is a. You want to copy to it. The function cannot free it, because the caller will need it. Therefore, the caller must take the responsibility to free it, once the caller has done with it.
The array has n elements, indexed 0 through n-1. The usual way to express the upper bound on the index thus is i < n.
The consts I have added are not required, but well-written code will probably include them.
Altought, there are previous GOOD answers to this question, I couldn't avoid added my own. Since I got pascal programming in Collegue, I am used to do this, in C related programming languages:
void* AnyFunction(int AnyParameter)
{
void* Result = NULL;
DoSomethingWith(Result);
return Result;
}
This, helps me to easy debug, and avoid bugs like the one mention by #ysap, related to pointers.
Something important to remember, is that the question mention to return a SINGLE pointer, this a common caveat, because a pointer, can be used to address a single item, or a consecutive array !!!
This question suggests to use an array as A CONCEPT, with pointers, NOT USING ARRAY SYNTAX.
// returns a single pointer to an array:
student_record* answer4(student_record* student, unsigned int n)
{
// empty result variable for this function:
student_record* Result = NULL;
// the result will allocate a conceptual array, even if it is a single pointer:
student_record* Result = malloc(sizeof(student_record)*n);
// a copy of the destination result, will move for each item
student_record* dest = Result;
int i;
for(i = 0; i < n ; i++){
// copy contents, not address:
*dest = *student;
// move to next item of "Result"
dest++;
}
// the data referenced by "Result", was changed using "dest"
return Result;
} // student_record* answer4(...)
Check that, there is not subscript operator here, because of addressing with pointers.
Please, don't start a pascal v.s. c flame war, this is just a suggestion.

How to realloc an array inside a function with no lost data? (in C )

I have a dynamic array of structures, so I thought I could store the information about the array in the first structure.
So one attribute will represent the amount of memory allocated for the array and another one representing number of the structures actually stored in the array.
The trouble is, that when I put it inside a function that fills it with these structures and tries to allocate more memory if needed, the original array gets somehow distorted.
Can someone explain why is this and how to get past it?
Here is my code
#define INIT 3
typedef struct point{
int x;
int y;
int c;
int d;
}Point;
Point empty(){
Point p;
p.x=1;
p.y=10;
p.c=100;
p.d=1000; //if you put different values it will act differently - weird
return p;
}
void printArray(Point * r){
int i;
int total = r[0].y+1;
for(i=0;i<total;i++){
printf("%2d | P [%2d,%2d][%4d,%4d]\n",i,r[i].x,r[i].y,r[i].c,r[i].d);
}
}
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
}
void enter(Point* r,int c){
int i;
for(i=1;i<c;i++){
r[r[0].y+1]=empty();
r[0].y++;
if( (r[0].y+2) >= r[0].x ){ /*when the amount of Points is near
*the end of allocated memory.
reallocate the array*/
reallocFunction(r);
}
}
}
int main(int argc, char** argv) {
Point * r=(Point *) malloc ( sizeof ( Point ) * INIT );
r[0]=empty();
r[0].x=INIT; /*so here I store for how many "Points" is there memory
//in r[0].y theres how many Points there are.*/
enter(r,5);
printArray(r);
return (0);
}
Your code does not look clean to me for other reasons, but...
void reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
}
The problem here is that r in this function is the parameter, hence any modifications to it are lost when the function returns. You need some way to change the caller's version of r. I suggest:
Point * // Note new return type...
reallocFunction(Point * r){
r=(Point *) realloc(r,r[0].x*2*sizeof(Point));
r[0].x*=2;
r[0].y++;
return r; // Note: now we return r back to the caller..
}
Then later:
r = reallocFunction(r);
Now... Another thing to consider is that realloc can fail. A common pattern for realloc that accounts for this is:
Point *reallocFunction(Point * r){
void *new_buffer = realloc(r, r[0].x*2*sizeof(Point));
if (!new_buffer)
{
// realloc failed, pass the error up to the caller..
return NULL;
}
r = new_buffer;
r[0].x*=2;
r[0].y++;
return r;
}
This ensures that you don't leak r when the memory allocation fails, and the caller then has to decide what happens when your function returns NULL...
But, some other things I'd point out about this code (I don't mean to sound like I'm nitpicking about things and trying to tear them apart; this is meant as constructive design feedback):
The names of variables and members don't make it very clear what you're doing.
You've got a lot of magic constants. There's no explanation for what they mean or why they exist.
reallocFunction doesn't seem to really make sense. Perhaps the name and interface can be clearer. When do you need to realloc? Why do you double the X member? Why do you increment Y? Can the caller make these decisions instead? I would make that clearer.
Similarly it's not clear what enter() is supposed to be doing. Maybe the names could be clearer.
It's a good thing to do your allocations and manipulation of member variables in a consistent place, so it's easy to spot (and later, potentially change) how you're supposed to create, destroy and manipulate one of these objects. Here it seems in particular like main() has a lot of knowledge of your structure's internals. That seems bad.
Use of the multiplication operator in parameters to realloc in the way that you do is sometimes a red flag... It's a corner case, but the multiplication can overflow and you can end up shrinking the buffer instead of growing it. This would make you crash and in writing production code it would be important to avoid this for security reasons.
You also do not seem to initialize r[0].y. As far as I understood, you should have a r[0].y=0 somewhere.
Anyway, you using the first element of the array to do something different is definitely a bad idea. It makes your code horribly complex to understand. Just create a new structure, holding the array size, the capacity, and the pointer.

Passing a dynamic array in to functions in C

I'm trying to create a function which takes an array as an argument, adds values to it (increasing its size if necessary) and returns the count of items.
So far I have:
int main(int argc, char** argv) {
int mSize = 10;
ent a[mSize];
int n;
n = addValues(a,mSize);
for(i=0;i<n;i++) {
//Print values from a
}
}
int addValues(ent *a, int mSize) {
int size = mSize;
i = 0;
while(....) { //Loop to add items to array
if(i>=size-1) {
size = size*2;
a = realloc(a, (size)*sizeof(ent));
}
//Add to array
i++;
}
return i;
}
This works if mSize is large enough to hold all the potential elements of the array, but if it needs resizing, I get a Segmentation Fault.
I have also tried:
int main(int argc, char** argv) {
...
ent *a;
...
}
int addValues(ent *a, int mSize) {
...
a = calloc(1, sizeof(ent);
//usual loop
...
}
To no avail.
I assume this is because when I call realloc, the copy of 'a' is pointed elsewhere - how is it possible to modify this so that 'a' always points to the same location?
Am I going about this correctly? Are there better ways to deal with dynamic structures in C? Should I be implementing a linked list to deal with these?
The main problem here is that you're trying to use realloc with a stack-allocated array. You have:
ent a[mSize];
That's automatic allocation on the stack. If you wanted to use realloc() on this later, you would create the array on the heap using malloc(), like this:
ent *a = (ent*)malloc(mSize * sizeof(ent));
So that the malloc library (and thus realloc(), etc.) knows about your array. From the looks of this, you may be confusing C99 variable-length arrays with true dynamic arrays, so be sure you understand the difference there before trying to fix this.
Really, though, if you are writing dynamic arrays in C, you should try to use OOP-ish design to encapsulate information about your arrays and hide it from the user. You want to consolidate information (e.g. pointer and size) about your array into a struct and operations (e.g. allocation, adding elements, removing elements, freeing, etc.) into special functions that work with your struct. So you might have:
typedef struct dynarray {
elt *data;
int size;
} dynarray;
And you might define some functions to work with dynarrays:
// malloc a dynarray and its data and returns a pointer to the dynarray
dynarray *dynarray_create();
// add an element to dynarray and adjust its size if necessary
void dynarray_add_elt(dynarray *arr, elt value);
// return a particular element in the dynarray
elt dynarray_get_elt(dynarray *arr, int index);
// free the dynarray and its data.
void dynarray_free(dynarray *arr);
This way the user doesn't have to remember exactly how to allocate things or what size the array is currently. Hope that gets you started.
Try reworking it so a pointer to a pointer to the array is passed in, i.e. ent **a. Then you will be able to update the caller on the new location of the array.
this is a nice reason to use OOP. yes, you can do OOP on C, and it even looks nice if done correctly.
in this simple case you don't need inheritance nor polymorphism, just the encapsulation and methods concepts:
define a structure with a length and a data pointer. maybe an element size.
write getter/setter functions that operate on pointers to that struct.
the 'grow' function modifies the data pointer within the struct, but any struct pointer stays valid.
If you changed the variable declaration in main to be
ent *a = NULL;
the code would work more like you envisioned by not freeing a stack-allocated array. Setting a to NULL works because realloc treats this as if the user called malloc(size). Keep in mind that with this change, the prototype to addValue needs to change to
int addValues(ent **a, int mSize)
and that the code needs to handle the case of realloc failing. For example
while(....) { //Loop to add items to array
tmp = realloc(*a, size*sizeof(ent));
if (tmp) {
*a = tmp;
} else {
// allocation failed. either free *a or keep *a and
// return an error
}
//Add to array
i++;
}
I would expect that most implementations of realloc will internally allocate twice as much memory if the current buffer needs resizing making the original code's
size = size * 2;
unnecessary.
You are passing the array pointer by value. What this means is:
int main(int argc, char** argv) {
...
ent *a; // This...
...
}
int addValues(ent *a, int mSize) {
...
a = calloc(1, sizeof(ent); // ...is not the same as this
//usual loop
...
}
so changing the value of a in the addValues function does not change the value of a in main. To change the value of a in main you need to pass a reference to it to addValues. At the moment, the value of a is being copied and passed to addValues. To pass a reference to a use:
int addValues (int **a, int mSize)
and call it like:
int main(int argc, char** argv) {
...
ent *a; // This...
...
addValues (&a, mSize);
}
In the addValues, access the elements of a like this:
(*a)[element]
and reallocate the array like this:
(*a) = calloc (...);
Xahtep explains how your caller can deal with the fact that realloc() might move the array to a new location. As long as you do this, you should be fine.
realloc() might get expensive if you start working with large arrays. That's when it's time to start thinking of using other data structures -- a linked list, a binary tree, etc.
As stated you should pass pointer to pointer to update the pointer value.
But I would suggest redesign and avoid this technique, in most cases it can and should be avoided. Without knowing what exactly you trying to achieve it's hard to suggest alternative design, but I'm 99% sure that it's doable other way. And as Javier sad - think object oriented and you will always get better code.
Are you really required to use C? This would be a great application of C++'s "std::vector", which is precisely a dynamically-sized array (easily resizeble with a single call you don't have to write and debug yourself).

Resources