How to avoid freeing objects that are stored in containers with the same reference count - c

I have been working on some features of a custom programming language written in c. Currently i'm working on a system that does reference counting for objects in the language, which in c are represented as structs with among other things, a reference count.
There also is a feature which can free all currently allocated objects (say before the exit of the program to clean up all memory). Now here lies the problem exactly.
I have been thinking about how to do it best but i'm running into some problems. Let me sketch out the situation a bit:
2 new integers are allocated. both have reference count of 1
1 new list is allocated, also with a reference count of 1
now both integers go in the list, which gives them a reference count of 2
after these actions both integers go out of scope for some reason, so their reference count drops to 1 as they are still in the list.
Now i'm done with these objects so i run the function to delete all tracked objects. However, as you might have noticed both the list and the objects in the list have the same reference count (1). This means there is no way to decide which object to free first.
If i would free the integers before the list, the list will try to decrement the reference count on the integers which were freed before, which will segfault.
If the list would be freed before the integers, it would decrement the reference count of the integers to 0, which automatically frees them too and no further steps need to be taken to free the integers. They aren't tracked anymore.
Currently i have a system that works most of the time but not for the example i give above, where i free the objects based on their reference count. Highest count latest. This obviously only works as long as the integers have higher reference count than the list which is as visible in the example above, not always the case. (It only works assuming the integers didn't drop out of scope so they still have a higher reference count than the list)
Note: i have already found one way which i really don't like: adding a flag to every object indicating it is in a container so cant be freed. I don't like this because it adds some memory overhead to every allocated object, and when there is a circular dependency no object would be freed. Of course a cycle detector could fix this but preferably i'd like to do this with the reference counting only.
Let me give a concrete example of the described steps above:
//this initializes and sets a garbage collector object.
//Basically it's a datastructure which records every allocated object,
//and is able to free them all or in the future
//run some cycle detection on all objects.
//It has to be set before allocating objects
garbagecollector *gc = init_garbagecollector();
set_garbagecollector(gc);
//initialize a tracked object fromthe c integer value 10
myobject * a = myinteger_from_cint(10);
myobject * b = myinteger_from_cint(10);
myobject * somelist = mylist_init();
mylist_append(somelist,a);
mylist_append(somelist,b);
// Simulate the going out of scope of the integers.
// There are no functions yet so i can't actually do it but this
// is a situation which can happen and has happened a couple of times
DECREF(a);
DECREF(b);
//now the program is done. all objects have a refcount of 1
//delete the garbagecollector and with that all tracked objects
//there is no way to prevent the integers being freed before the list
delete_garbagecollector(gc);
what of course should happen is that 100% of the time, the list is freed before the integers are.
What would be a smarter way of freeing all existing objects, in a way such that objects stored in containers aren't freed before the containers they're in?

It depends on your intention with:
There also is a feature which can free all currently allocated objects (say before the exit of the program to clean up all memory).
If the goal is to forcibly deallocate every single object regardless of its ref count, then I would have a separate chunk of code that walks the object graph and frees each object without touching its ref count. The ref count itself is going to end up freed too, so there's little point in updating it.
If the goal is to just tell the system "We don't need the objects anymore" then another option is to simply walk the roots and decrement their ref counts. If there are no other references to them, they'll hit zero. They will then decrement the ref counts of everything they refer to before being deallocated. That in turn percolates through the object graph. If the roots are the only thing holding onto references at the point that you call this, it will effectively free everything.

You should not free anything until the reference count for somelist is zero.

Related

How does Go guarantee element pointers are valid for arrays and slices?

When you use indexer on array or slice in return you get the variable so you can take an address of it. I wonder how it is possible because the array/slice could more nested than the target variable:
// ptr declaration here
{
// array declaration here
ptr = &array[0];
}
In array case I see a problem that the data are on stack, with slice, that allocating it on heap does not solve automatically the problem, because GC could remove entire slice unless taking an address of the element links to the slice itself (thus preventing freeing memory).
Example: what happens when there is no guarantee on validity of the pointers -- let's say my array is a collection of colors. I pick one element, take an address of it, entire array is deleted (because it went of out scope), I check the value of element and it is 3.14. Or "hello world". Or maybe green. Since there is no guarantee it could be anything that is located at given address.
The Go compiler and the Go garbage collector guarantee that memory is not freed until it is no longer used.
To learn the basics of garbage collection, the Go team recommends The Garbage Collection Handbook: The Art of Automatic Memory Management.
See The Go Blog: Getting to Go: The Journey of Go's Garbage Collector for some history.

C: deleting elements mid array and have previous pointers working

I have several arrays like this (please ignore specific names):
static resource_t coap_cmp_res[MAX_CMPS];
e.g. [cmp1,cmp2,cmp3,cmp4,cmp5,0,0,0]
and a code that uses these elements, for example, coap_cmp_res[4] (cmp5) is associated with a REST resource, call it Res5.
At a certain point in time, I delete an element in that array at position x like this:
rest_deactivate_resource(&coap_cmp_res[x]);
e.g. for x = 2
[cmp1,cmp2,0,cmp4,cmp5,0,0,0]
What I then would like to do is have a single continuous array again like this
e.g. [cmp1,cmp2,cmp4,cmp5,0,0,0,0]
What I do currently is:
for(UInt8 i = x; i < MAX_CMPS; i++){
coap_cmp_res[i] = coap_cmp_res[i+1];
}
which gives [cmp1,cmp2,cmp4,cmp5,cmp5,0,0,0]
then I manually set the last non-zero element to 0.
e.g. [cmp1,cmp2,cmp4,cmp5,0,0,0,0]
So, this looks good, but the problem is that the Res5 is still associated with coap_cmp_res[4] and thus now the value 0, instead of cmp5, which is not what I desire.
I could deactivate and reactivate every resource after x in the array to have the associations working again, but was wondering if there was a more efficient way to go about this.
Hopefully this makes sense.
As the proverb says: "add a level of indirection". An array of resource_t* that point into coap_cmp_res and are stable. Then have Rea5 associated with a pointer, and use the indirection to reach into a valid entry.
static resource_t coap_cmp_res_data[MAX_CMPS];
static resource_t* coap_cmp_res_ptrs[MAX_CMPS]; // points into coap_cmp_res_data
When you remove an element, you update the entries in coap_cmp_res_ptrs, without moving them, and shrink coap_cmp_res_data. Any resource will still refer to the same position in coap_cmp_res_ptrs, and the indirection will take it to the current location of the resource.
An alternative approach, which may prove better in your case (you'd have to profile), is to use node based storage. I.e a linked list.

What are the internal differences of a T[] and a List<T> in terms of memory?

I was reading an article about array vs list, and the author says that an array is worse than a list, because (among other things) an array is a list of variables, but to me a list is also a list of variables. I mean, I can still do list[3] = new Item().
Actually, I have always somehow saw a List<T> like a wrapper for an array that allows me to use it easily without caring about handling its structure.
What are the internal differences between a T[] and a List<T> in terms of heap/stack memory usage?
Since an array is a static structure, after the initialization, it allocates the memory that you've demanded.
int arr[5];
For example here there are 5 int objects created in memory. But when you use lists, according to its implementation, it gives you first an array with predefined capacity. And while you are adding your elements, if you exceed the capacity then it scales up. In some implementations it just doubles its size, or in some implementations it enlarges itself when the granted capacity is half full.
The author's point about a "list of variables" wasn't about memory. It's that an array contains your internal variables, and returning it allows them to be reassigned by the caller. It comes down to this:
Only pass out an array if it is wrapped up by a read-only object.
If you pass out an internal List<T>, you have the same problem, but here's the key:
We have an extensibility model for lists because lists are classes. We
have no ability to make an “immutable array”. Arrays are what they are
and they’re never going to change.
And, at the time the article was written, the IReadOnlyList interface didn't exist yet (.NET 4.5), though he probably would have mentioned it if it had. I believe he was advocating implementing an IList<T> that would simply throw an exception if you tried to use the setter. Of course, if the user doesn't need the ability to access elements by index, you don't need a list interface at all -- you can just wrap it in a ReadOnlyCollection<T> and return it as an IEnumerable<T>.

How does reference counting work?

How do reference counted structures work? For example let's look at SDL_Surface:
typedef struct SDL_Surface
{
...
int refcount;
} SDL_Surface;
s = SDL_CreateRGBSurface(...); // <-- what happens here?
SDL_FreeSurface(s); // <-- and here?
How do I implement reference counting in my own code?
SDL_CreateRGBSurface will allocate a new instance of SDL_Surface (or a suitable derived structure), and increment the reference count (setting it to 1).
SDL_FreeSurface will decrement the reference count, and check if it's zero. If it is, that means that no other objects are using the surface, and it will be deallocated.
SDL also guarantees that the refcount is incremented whenever the object gets used somewhere else (e.g. in the renderer). So, if the reference count is nonzero when SDL_FreeSurface is called, then some other object must be using it. That other object will eventually also call SDL_FreeSurface and release the surface for good.
Reference counting allows you to cheaply track objects without the overhead of a cycle-collecting garbage collector. However, one drawback is that it won't handle cycles (e.g. where object A holds a reference to B, which in turn holds a reference to B); in those cases, the cycles will keep the objects involved alive even when all other external references are gone.
To implement refcounting, you simply need to add a refcount field to any objects you want to refcount, and ensure (in your public API, and internally) that every allocation and deallocation of the object goes through the appropriate refcount-maintaining interface (which you must define). Finally, when an object or function wants a reference to your refcounted objects, they must first get the reference by incrementing the refcount (directly or through some interface). When they are done they must decrement the refcount.

How to return an array, without allocating it, if size is unknown to caller?

Consider following function:
int get_something (int* array, int size);
It's purpose is to fill the passed array[size] with data from external resource (queryes to resource are expensive). The question is what to do, if resource has more elements, than provided array can handle? What is the best approach?
Edit: Current solution added:
Our approach, at the moment is following:
When user calls get_something() first time, with null argument we perform a full Query, allocate data in a cache (which is just a key-value storage) and return a number of items.
When user calls get_something() next time, with properly initialized buffer, we return him data from cache and clear a cache entry.
If user does not call get_something(), timeout occurs and cache for that item gets freed.
If user calls get_something() too late, and data has been cleared, we generate error state, so user knows that he has to repeat the request.
One option is to not modify the array at all and instead return the needed size as the return result. The caller must then call your function again with an array of at least this size.
Ok, your basic requirement is to Query a resource, and cache the returned data in memory, to avoid multiple accesses.
That means you will have to allocate memory within your program to store all of the data.
Problem #1 is to populate that cache. I will assume that you have that figured out, and there is some function get_resource();
problem #2 is how to design an api to allow client/user code to interact with that data.
In your example you you are using an array allocated by the client, as the cache, hoping to solve both problems with 1 buffer, but this doesn't solve the problem in all cases ( hence your posting ). So you really need to separate the 2 problems.
Model number #1 is to provide iterator / cursor functionality
iterator = get_something(); // Triggers caching of data from Resource
data = get_next_single_something( iterator );
status = release_something( iterator );
// The logic to release the data could be done automagically in get_next,
// after returning the last single_something, if that is always the use case.
Model #2 is to return the Whole object in a malloced buffer, and let the client manage the whole thing
data_type *pData=NULL;
unsigned size = get_something( &pData ); // Triggers caching of data from Resource
process( size, pData );
free( pData );
pData=NULL;
Model #3. If you are married to the client array, you can use Model #1 to return multiple values at once, but if there are more values, then get_something() will have to build a cache, and the client will still have to iterate.
Use realloc .
Reference link .
My choice would be to use the same model as fread() and turn the interface into a stream of sorts.
i.e.
either fill the buffer up or put all the items in it and return the number of items actually read
maintain some sort of state so that subsequent calls only get unread items
return 0 once all the items have been read
return a negative number if an error occurs.
allocate array dynamically i.e using malloc() and then, in the function, either use realloc() or free the previous list and allocate another, fill it and return the new size. For the second approach you can use the return value for returning new size but to update the callers address of array you will need to change the function to accept int** instead of int*
Can you do a check on how many elements the resource has? If so I'd do that then copy the array to an array as large as the resource.
or perhaps copying the array to an array double its size when you're reaching near the end?
http://www.devx.com/tips/Tip/13291
That depends on how your program should handle that situation, I guess.
One approach could be to fill the array to it's maximum, and return the total number of elements which are available. That way the caller could check if he needs to call the function again (see Mark Byers answer).
Logic behind that:
- Creates array with 100 items
- Calls your function and gets 150 returned
- Increases the array size or creates a second one
and calls your function again with that array
- Repeats that unless the returned item count is
equal or less the array size

Resources