How does one implement weak references with Boehm GC? - c

I have a personal project which I implement using the Boehm GC. I need to implement a sort of event type, which should hold references to other events. But I also need to ensure that the events pointed to are still collectable, thus I need weak references for this.
Let's say we have events A, B and C. I configure these events to signal event X whenever any of them is signaled. This means A, B and C must hold a reference to event X. What I want is that if event X is unreachable the events A, B and C don't need to signal it anymore. Thus a weak reference is what I thought of.
Is there any other way to do this? I don't want to change the GC but if necessary (the allocation interface remains clean) I could.
The project is written in C. If need be, I will provide more info. Notably, if there is any way to implement such events directly with this semantics, there's no need for actual weak references (events MAY have a reference cycle though while they are not signaled).

The Boehm GC does not have a concept of weak references per se. However, it does not scan memory allocated by the system malloc for references to managed objects, so pointers stored in such memory do not prevent the pointed-to object from being collected. Of course, that approach means that the objects containing the pointers will not be managed by the collector.
Alternatively, it should be possible to abuse GC_MALLOC_ATOMIC() or GC_malloc_explicitly_typed() to obtain a managed object that can contain pointers to other managed objects without preventing those other objects from being collected. That involves basically lying to GC about whether some members are pointers, so as to prevent them from being scanned.
Either way, you also require some mechanism for receiving notice when weakly-referenced objects are collected, so as to avoid attempting to access them afterward. GC has an interface for registering finalizer callbacks to be invoked before an object is collected, and that looks like your best available option for the purpose.
Overall, I think what you're asking for is doable, but with a lot of DIY involved. At a high level,
use GC_MALLOC_ATOMIC() to allocate a wrapper object around a pointer to the weakly referenced object. Allocating it this way allows the wrapper to itself be managed by GC, without the pointer within being scanned during GC's reachability analyses.
use GC_register_finalizer to register a finalizer function that sets the wrapper's pointer to NULL when GC decides that the pointed-to object is inaccessible.
users of the wrapper are obligated to check whether the pointer within is NULL before attempting to dereference it.

Related

Shared pointer in rust arrays

I have two arrays:
struct Data {
all_objects: Vec<Rc<dyn Drawable>>;
selected_objects: Vec<Rc<dyn Drawable>>;
}
selected_objects is guarenteed to be a subset of all_objects. I want to be able to somehow be able to add or remove mutable references to selected objects.
I can add the objects easily enough to selected_objects:
Rc::get_mut(selected_object).unwrap().select(true);
self.selected_objects.push(selected_object.clone());
However, if I later try:
for obj in self.selected_objects.iter_mut() {
Rc::get_mut(obj).unwrap().select(false);
}
This gives a runtime error, which matches the documentation for get_mut: "Returns None otherwise, because it is not safe to mutate a shared value."
However, I really want to be able to access and call arbitrary methods on both arrays, so I can efficiently perform operations on the selection, while also being able to still perform operations for all objects.
It seems Rc does not support this, it seems RefMut is missing a Clone() that alows me to put it into multiple arrays, plus not actually supporting dyn types. Box is also missing a Clone(). So my question is, how do you store writable pointers in multiple arrays? Is there another type of smart pointer for this purpose? Do I need to nest them? Is there some other data structure more suitable? Is there a way to give up the writable reference?
Ok, it took me a bit of trial and error, but I have a ugly solution:
struct Data {
all_objects: Vec<Rc<RefCell<dyn Drawable>>>;
selected_objects: Vec<Rc<RefCell<dyn Drawable>>>;
}
The Rc allows you to store multiple references to an object. RefCell makes these references mutable. Now the only thing I have to do is call .borrow() every time I use a object.
While this seems to work and be reasonably versitle, I'm still open for cleaner solutions.

Using shared_ptr for refcounting

I have a class whose objects are extensively used using shared_pointers. However, I want to track the usage of these objects and when the refcount goes to a particular value I want to delete the object. How can we do this ? I was thinking of overriding the shared_ptr's destructor so that I can decrement the refcount when every shared_ptr reference goes away. However, looks like that is not possible. What are the alternatives ?
You really wouldn't want to do that because if the refcount is greater than zero it means there are still pointers pointing to the object out there, probably intending to access it.
If you really wanted to do something like that, you'd have to make your own shared_ptr class, but I'd also add functionality for checking if the pointer is still valid since it might disappear on people.

NSRecursivelock v NSNotification performance

So I was using an array to hold objects, then iterating over them to call a particular method. As the objects in this array could be enumerated/mutated on different threads I was using an NSRecursiveLock.
I then realised that as this method is always called on every object in this array, I could just use an NSNotification to trigger it, and do away with the array and lock.
I am just double checking that this is a good idea? NSNotification will be faster than a lock, right?
Thanks
First, it seems that the locking should be done on the objects, not on the array. You are not mutating the array, but the objects. So you need one mutex per object. It will give you a finer granularity and allow concurrent updates to different objects to proceed in parallel (it wouldn't with a global lock).
Second, a recursive lock is complete overkill. If you wish to have mutual exclusion on the objects, each object should have a standard mutex. If what you are doing inside the critical section is cpu bound and really short, you might consider using a spinlock (it won't even trap in the OS. Only use for short CPU bound critical sections though). Recursive mutex are meant to be used when a thread k can (because of its logic) acquire another lock on an object it has already locked himself (hence the recursive name).
Third, NSNotifications (https://developer.apple.com/library/mac/documentation/Cocoa/Reference/Foundation/Classes/NSNotification_Class/) will allocate memory, drop the notification in a notification Center, do locking to actually implement that (the adding to the center) and finally dispatch the notifications in the center and deallocate the notification. So it is heavier than a plain simple locking. It "hides" the synchronization APIs inside the center, but it does not eliminate them, and you pay the price of memory allocations/deallocations.
If you wish to modify the array (add/remove), then you should also synchronize on these operations, but that would be unfortunate as access to independent entries of the array would now collide.
Hope that helps.

What exactly needs to be PROTECTed when writing C functions for use in R

I thought this was pretty straightforward, basically, any SEXP type objects I create in C code must be protected, but it starts getting a little murkier (to me) when using linked lists and CAR / CDR, etc. I started off with this comment in Writing R Extensions:
Protecting an R object automatically protects all the R objects pointed to in the corresponding SEXPREC, for example all elements of a protected list are automatically protected.
And this from R Internals:
A SEXPREC is a C structure containing the 32-bit header as described above, three pointers (to the attributes, previous and next node) and the node data ...
LISTSXP: Pointers to the CAR, CDR (usually a LISTSXP or NULL) and TAG (a SYMSXP or NULL).
So I interpret this to mean that, if I do something like:
SEXP s, t, u;
PROTECT(s = allocList(2));
SETCAR(s, ScalarLogical(1));
SETCADR(s, ScalarLogical(0));
t = CAR(s);
u = CADR(s);
Then t and u are protected by virtue of being pointers to objects that are within the protected list s (corollary question: is there a way to get the PROTECTED status of an object? Couldn't see anything in Rinternals.h that fit the bill). Yet I see stuff like (from src/main/unique.c):
// Starting on line 1274 (R 3.0.2), note `args` protected by virtue of being
// a function argument
SEXP attribute_hidden do_matchcall(SEXP call, SEXP op, SEXP args, SEXP env)
{
// ommitting a bunch of lines, and then, on line 1347:
PROTECT(b = CAR(args));
// ...
}
This suggests all the objects within args are not protected, but that seems very odd since then any of the args objects could have gotten GCed at any point. Since CAR just returns a pointer to a presumably already protected object, why do we need to protect it here?
Think about it this way: PROTECT doesn't actually do something to the object. Rather, it adds a temporary GC root so that the object is considered alive by the collector. Any objects it contains are also alive, not because of some protection applied from C, but because they are pointed-to by another object that is itself already considered alive - the same as any other normal live object. So setting the car of a protected list not only keeps that object alive, it also potentially releases whatever was previously in the car for GC, removing it from that particular live tree (protecting the list didn't recursively affect the elements).
So in general you aren't going to have an easy way of telling whether an object is "protected" or not in this wider sense, because it's actually just following the same rules as GC does elsewhere and there's nothing special about the object. You could potentially trace through the entire PROTECT list and see if you find it, but that would be... inefficient, to say the least (there's also nothing to say that the ownership tree leading to the object in question from the one on the PROTECT list is the one that will keep it alive for the longest).
The line in do_matchcall is actually there for a completely unrelated reason: protecting CAR(args) only happens in one branch of a conditional - in the other branch, it's a newly-created object that gets protected. Redundantly protecting the value from this branch as well means that there's guaranteed to be the same number of objects on the PROTECT stack regardless of which branch was taken, which simplifies the corresponding UNPROTECT at the end of the function to an operation on a constant number of slots (no need to replicate the check down there to vary it).

How does reference counting work?

How do reference counted structures work? For example let's look at SDL_Surface:
typedef struct SDL_Surface
{
...
int refcount;
} SDL_Surface;
s = SDL_CreateRGBSurface(...); // <-- what happens here?
SDL_FreeSurface(s); // <-- and here?
How do I implement reference counting in my own code?
SDL_CreateRGBSurface will allocate a new instance of SDL_Surface (or a suitable derived structure), and increment the reference count (setting it to 1).
SDL_FreeSurface will decrement the reference count, and check if it's zero. If it is, that means that no other objects are using the surface, and it will be deallocated.
SDL also guarantees that the refcount is incremented whenever the object gets used somewhere else (e.g. in the renderer). So, if the reference count is nonzero when SDL_FreeSurface is called, then some other object must be using it. That other object will eventually also call SDL_FreeSurface and release the surface for good.
Reference counting allows you to cheaply track objects without the overhead of a cycle-collecting garbage collector. However, one drawback is that it won't handle cycles (e.g. where object A holds a reference to B, which in turn holds a reference to B); in those cases, the cycles will keep the objects involved alive even when all other external references are gone.
To implement refcounting, you simply need to add a refcount field to any objects you want to refcount, and ensure (in your public API, and internally) that every allocation and deallocation of the object goes through the appropriate refcount-maintaining interface (which you must define). Finally, when an object or function wants a reference to your refcounted objects, they must first get the reference by incrementing the refcount (directly or through some interface). When they are done they must decrement the refcount.

Resources