How does reference counting work?

How does reference counting work? - c

How do reference counted structures work? For example let's look at SDL_Surface:
typedef struct SDL_Surface
{
...
int refcount;
} SDL_Surface;
s = SDL_CreateRGBSurface(...); // <-- what happens here?
SDL_FreeSurface(s); // <-- and here?
How do I implement reference counting in my own code?

SDL_CreateRGBSurface will allocate a new instance of SDL_Surface (or a suitable derived structure), and increment the reference count (setting it to 1).
SDL_FreeSurface will decrement the reference count, and check if it's zero. If it is, that means that no other objects are using the surface, and it will be deallocated.
SDL also guarantees that the refcount is incremented whenever the object gets used somewhere else (e.g. in the renderer). So, if the reference count is nonzero when SDL_FreeSurface is called, then some other object must be using it. That other object will eventually also call SDL_FreeSurface and release the surface for good.
Reference counting allows you to cheaply track objects without the overhead of a cycle-collecting garbage collector. However, one drawback is that it won't handle cycles (e.g. where object A holds a reference to B, which in turn holds a reference to B); in those cases, the cycles will keep the objects involved alive even when all other external references are gone.
To implement refcounting, you simply need to add a refcount field to any objects you want to refcount, and ensure (in your public API, and internally) that every allocation and deallocation of the object goes through the appropriate refcount-maintaining interface (which you must define). Finally, when an object or function wants a reference to your refcounted objects, they must first get the reference by incrementing the refcount (directly or through some interface). When they are done they must decrement the refcount.

Related

How does one implement weak references with Boehm GC?

I have a personal project which I implement using the Boehm GC. I need to implement a sort of event type, which should hold references to other events. But I also need to ensure that the events pointed to are still collectable, thus I need weak references for this.
Let's say we have events A, B and C. I configure these events to signal event X whenever any of them is signaled. This means A, B and C must hold a reference to event X. What I want is that if event X is unreachable the events A, B and C don't need to signal it anymore. Thus a weak reference is what I thought of.
Is there any other way to do this? I don't want to change the GC but if necessary (the allocation interface remains clean) I could.
The project is written in C. If need be, I will provide more info. Notably, if there is any way to implement such events directly with this semantics, there's no need for actual weak references (events MAY have a reference cycle though while they are not signaled).

The Boehm GC does not have a concept of weak references per se. However, it does not scan memory allocated by the system malloc for references to managed objects, so pointers stored in such memory do not prevent the pointed-to object from being collected. Of course, that approach means that the objects containing the pointers will not be managed by the collector.
Alternatively, it should be possible to abuse GC_MALLOC_ATOMIC() or GC_malloc_explicitly_typed() to obtain a managed object that can contain pointers to other managed objects without preventing those other objects from being collected. That involves basically lying to GC about whether some members are pointers, so as to prevent them from being scanned.
Either way, you also require some mechanism for receiving notice when weakly-referenced objects are collected, so as to avoid attempting to access them afterward. GC has an interface for registering finalizer callbacks to be invoked before an object is collected, and that looks like your best available option for the purpose.
Overall, I think what you're asking for is doable, but with a lot of DIY involved. At a high level,
use GC_MALLOC_ATOMIC() to allocate a wrapper object around a pointer to the weakly referenced object. Allocating it this way allows the wrapper to itself be managed by GC, without the pointer within being scanned during GC's reachability analyses.
use GC_register_finalizer to register a finalizer function that sets the wrapper's pointer to NULL when GC decides that the pointed-to object is inaccessible.
users of the wrapper are obligated to check whether the pointer within is NULL before attempting to dereference it.

How to avoid freeing objects that are stored in containers with the same reference count

I have been working on some features of a custom programming language written in c. Currently i'm working on a system that does reference counting for objects in the language, which in c are represented as structs with among other things, a reference count.
There also is a feature which can free all currently allocated objects (say before the exit of the program to clean up all memory). Now here lies the problem exactly.
I have been thinking about how to do it best but i'm running into some problems. Let me sketch out the situation a bit:
2 new integers are allocated. both have reference count of 1
1 new list is allocated, also with a reference count of 1
now both integers go in the list, which gives them a reference count of 2
after these actions both integers go out of scope for some reason, so their reference count drops to 1 as they are still in the list.
Now i'm done with these objects so i run the function to delete all tracked objects. However, as you might have noticed both the list and the objects in the list have the same reference count (1). This means there is no way to decide which object to free first.
If i would free the integers before the list, the list will try to decrement the reference count on the integers which were freed before, which will segfault.
If the list would be freed before the integers, it would decrement the reference count of the integers to 0, which automatically frees them too and no further steps need to be taken to free the integers. They aren't tracked anymore.
Currently i have a system that works most of the time but not for the example i give above, where i free the objects based on their reference count. Highest count latest. This obviously only works as long as the integers have higher reference count than the list which is as visible in the example above, not always the case. (It only works assuming the integers didn't drop out of scope so they still have a higher reference count than the list)
Note: i have already found one way which i really don't like: adding a flag to every object indicating it is in a container so cant be freed. I don't like this because it adds some memory overhead to every allocated object, and when there is a circular dependency no object would be freed. Of course a cycle detector could fix this but preferably i'd like to do this with the reference counting only.
Let me give a concrete example of the described steps above:
//this initializes and sets a garbage collector object.
//Basically it's a datastructure which records every allocated object,
//and is able to free them all or in the future
//run some cycle detection on all objects.
//It has to be set before allocating objects
garbagecollector *gc = init_garbagecollector();
set_garbagecollector(gc);
//initialize a tracked object fromthe c integer value 10
myobject * a = myinteger_from_cint(10);
myobject * b = myinteger_from_cint(10);
myobject * somelist = mylist_init();
mylist_append(somelist,a);
mylist_append(somelist,b);
// Simulate the going out of scope of the integers.
// There are no functions yet so i can't actually do it but this
// is a situation which can happen and has happened a couple of times
DECREF(a);
DECREF(b);
//now the program is done. all objects have a refcount of 1
//delete the garbagecollector and with that all tracked objects
//there is no way to prevent the integers being freed before the list
delete_garbagecollector(gc);
what of course should happen is that 100% of the time, the list is freed before the integers are.
What would be a smarter way of freeing all existing objects, in a way such that objects stored in containers aren't freed before the containers they're in?

It depends on your intention with:
There also is a feature which can free all currently allocated objects (say before the exit of the program to clean up all memory).
If the goal is to forcibly deallocate every single object regardless of its ref count, then I would have a separate chunk of code that walks the object graph and frees each object without touching its ref count. The ref count itself is going to end up freed too, so there's little point in updating it.
If the goal is to just tell the system "We don't need the objects anymore" then another option is to simply walk the roots and decrement their ref counts. If there are no other references to them, they'll hit zero. They will then decrement the ref counts of everything they refer to before being deallocated. That in turn percolates through the object graph. If the roots are the only thing holding onto references at the point that you call this, it will effectively free everything.

You should not free anything until the reference count for somelist is zero.

What is reference counter and how does it work?

I've been writing code, and I'm in a point where I should have another program calling my library. I should make a reference counter for the output of my library. Basic idea as I have understood is that, I need to have reference counter struct inside my struct that I want to pass around. So my questions are following:
What should I keep in mind when making a reference counter?
What are complete don'ts when making a reference counter?
Is there really detailed examples where to start with this?
Thank you for your answers in advance!

Reference counting allows clients of your library to keep reference objects created by your library on the heap and allows you to keep track of how many references are still active. When the reference count goes to zero you can safely free the memory used by the object. It is a way to implement basic "garbage collection".
In C++, you can do this more easily, by using "smart pointers" that manage the reference count through the constructor and destructor, but it sounds like you are looking to do it in C.
You need to be very clear on the protocol that you expect users of your libraries to follow when accessing your objects so that they properly communicate when a new reference is created or when a reference is no longer needed. Getting this wrong will either prematurely free memory that is still being referenced or cause memory to never be freed (memory leak).
Basically, You include a reference count in your struct, that gets incremented each time that your library returns the struct.
You also need to provide a function that releases the reference:
struct Object {
int ref;
....
}
Object* getObject (...) {
Object *p = .... // find or malloc the object
p->ref++;
return p;
}
void releaseReference (Object* p) {
p->ref--;
if (p->ref == 0) free(p);
}
void grabReference (Object* p) {
p->ref++;
}
Use grabReference() if a client of your library passes a reference to another client (in the above example, the initial caller of your library doesn't need to call grabReference())
If your code is multi-threaded then you need to make sure that you handle this correctly when incrementing or decrementing references

Using shared_ptr for refcounting

I have a class whose objects are extensively used using shared_pointers. However, I want to track the usage of these objects and when the refcount goes to a particular value I want to delete the object. How can we do this ? I was thinking of overriding the shared_ptr's destructor so that I can decrement the refcount when every shared_ptr reference goes away. However, looks like that is not possible. What are the alternatives ?

You really wouldn't want to do that because if the refcount is greater than zero it means there are still pointers pointing to the object out there, probably intending to access it.
If you really wanted to do something like that, you'd have to make your own shared_ptr class, but I'd also add functionality for checking if the pointer is still valid since it might disappear on people.

What exactly needs to be PROTECTed when writing C functions for use in R

I thought this was pretty straightforward, basically, any SEXP type objects I create in C code must be protected, but it starts getting a little murkier (to me) when using linked lists and CAR / CDR, etc. I started off with this comment in Writing R Extensions:
Protecting an R object automatically protects all the R objects pointed to in the corresponding SEXPREC, for example all elements of a protected list are automatically protected.
And this from R Internals:
A SEXPREC is a C structure containing the 32-bit header as described above, three pointers (to the attributes, previous and next node) and the node data ...
LISTSXP: Pointers to the CAR, CDR (usually a LISTSXP or NULL) and TAG (a SYMSXP or NULL).
So I interpret this to mean that, if I do something like:
SEXP s, t, u;
PROTECT(s = allocList(2));
SETCAR(s, ScalarLogical(1));
SETCADR(s, ScalarLogical(0));
t = CAR(s);
u = CADR(s);
Then t and u are protected by virtue of being pointers to objects that are within the protected list s (corollary question: is there a way to get the PROTECTED status of an object? Couldn't see anything in Rinternals.h that fit the bill). Yet I see stuff like (from src/main/unique.c):
// Starting on line 1274 (R 3.0.2), note `args` protected by virtue of being
// a function argument
SEXP attribute_hidden do_matchcall(SEXP call, SEXP op, SEXP args, SEXP env)
{
// ommitting a bunch of lines, and then, on line 1347:
PROTECT(b = CAR(args));
// ...
}
This suggests all the objects within args are not protected, but that seems very odd since then any of the args objects could have gotten GCed at any point. Since CAR just returns a pointer to a presumably already protected object, why do we need to protect it here?

Think about it this way: PROTECT doesn't actually do something to the object. Rather, it adds a temporary GC root so that the object is considered alive by the collector. Any objects it contains are also alive, not because of some protection applied from C, but because they are pointed-to by another object that is itself already considered alive - the same as any other normal live object. So setting the car of a protected list not only keeps that object alive, it also potentially releases whatever was previously in the car for GC, removing it from that particular live tree (protecting the list didn't recursively affect the elements).
So in general you aren't going to have an easy way of telling whether an object is "protected" or not in this wider sense, because it's actually just following the same rules as GC does elsewhere and there's nothing special about the object. You could potentially trace through the entire PROTECT list and see if you find it, but that would be... inefficient, to say the least (there's also nothing to say that the ownership tree leading to the object in question from the one on the PROTECT list is the one that will keep it alive for the longest).
The line in do_matchcall is actually there for a completely unrelated reason: protecting CAR(args) only happens in one branch of a conditional - in the other branch, it's a newly-created object that gets protected. Redundantly protecting the value from this branch as well means that there's guaranteed to be the same number of objects on the PROTECT stack regardless of which branch was taken, which simplifies the corresponding UNPROTECT at the end of the function to an operation on a constant number of slots (no need to replicate the check down there to vary it).