Shallow and deep destructors? - c

Imagine a list "a", and there's a copy constructor for lists which performs deep copying. If "b" is a list deep copied from "a", then both can be destroyed using simple destructors. This destructor should use deep destruction.
typedef struct list { void * first; struct list * next } list;
struct list * list_copy_constructor(const struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
void list_destructor(struct list * input);
Now imagine that you rename the copy constructor for lists as a deep copy constructor for lists, and add another shallow copy constructor for lists.
/** Performs shallow copy. */
struct list * list_shallow_copy_constructor(const struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
/** Performs deep copy. */
struct list * list_deep_copy_constructor(const struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
/** Be warned performs deep destruction. */
void list_destructor(struct list * input);
The destructor, which performs a deep destruction, can be used paired with the deep copy constructor calls.
Once you used the shallow copy constructor for lists, you would need to know which of both lists own the elements, and then one of the lists (the list owning the elements), can be destroyed with the destructor, but for the list that doesn't own the elements, I would need to destroy it using a shallow destructor I would need to create, before destroying the list owning the elements.
/** Performs shallow copy. */
struct list * list_shallow_copy_constructor(const struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
/** Performs deep copy. */
struct list * list_deep_copy_constructor(const struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
/** Performs shallow destruction. */
void list_shallow_destructor(struct list * input);
/** Performs deep destruction. */
void list_deep_destructor(struct list * input);
But, the problem is, I don't recognize shallow destructor as a term in bibliography, so I thought I might be doing something wrong. Am I doing something wrong? E.g. should I be using smart pointers already instead of deep and shallow destructors?

The concept of deep or shallow exists only in the mind of the programmer, and in C++ it is very arbitrary. By default raw pointer members are not deep destroyed when an object is destroyed, which you might call shallow, but you can write extra code in your object destructor to destroy deeply. On the other hand any members that have destructors get their destructors called, which you might call deep, and there is no way to avoid that. Exactly the same mechanism applies for the default copy and assignment, so is equally impossible to say an object is wholly deep or shallow copied or destroyed at a glance.
So the distinction is not really a property of object destructors, but their members.
Of course, now the question is about C again, but still mentions smart pointers. In C you have to decide what philosophy you want to implement, as there is no concept of destruction. If you follow a C++-like philosophy of having destruction functions for each type of member, and having them deep-call.
Having said that there are a number of strategies you might consider that would potentially produce a leaner model:
If /all/ the members of a particular container are owned or /all/ not owned, then a simple flag in the container for whether to destroy /all/ children is an adequate model.
If /all/ the objects are shared with another container, or this might be the last/only such container for a particular set of content, you could keep a circular list of sharing containers. When the destructor realises it is the last container it could destroy /all/ the content. On the other hand, you could simply implement this model with a shared_ptr to the one container instance, and when the last pointer is released then the container is destroyed.
If individual items in the container may be shared in arbitrary ways, then make it a container of shared_ptr to each item of content. This is the most robust model, but may have costs in terms of memory usage. Ultimately, somewhere there needs to be a reference count (though circular lists of referees are also good, it is much harder to mutex across threads) In C++ shared_ptr this is implemented using a stub, but in your own C objects, this is probably a counter member in the child object.

If you want to have lists with shared elements be correctly destructed, you cannot simply use "shallow destructors". Destructing them all with shallow destructors results in elements still residing in memory and being leaked. It also doesn't look good to mark one of the lists having a deep destructor while others having shallow ones. If you first destroy the list with deep destructor, others will have dangling pointers which you can accidentally access. So it looks like shallow destructor is not a well-recognized term simply because it is not of a great use. You just have destructors: functions that destroy the stuff that you objects conceptually own.
Now, for the particular task of sharing elements in lists and destroying everything in time shared pointers seem to be a reasonable solution. A shared pointer is conceptually a pointer to a struct consisting of two elements: an actual object (list element) and reference counter. Copying the shared pointer increases the counter; destroying the shared pointer decreases the counter and destructs the struct with object & counter if the counter fell to 0. In this scenario, each list own its copies of shared pointers, but the list elements themselves are owned by shared pointers rather than a list. Due to the shared pointer destruction semantics there is no trouble with destroying all the shared pointer copies that the list owns (the destructor doesn't deallocate memory unless there are no references left), so there is no need for distinguishing "shallow" and "deep" destructors, as shared pointers will take care of deleting themselves in time automatically.

As you already suspected, your design is weird. Think about it: if you are going to "shallow copy" a list, why not just take a pointer of it? "Shallow copy" has no use outside of a classroom, where it is only useful to explain what a "deep copy" is.
You either want multiple users to have independent lists, that can be used and destroy independently of the each other, or you want one user to "own" the list, and the others just point to that list. Your idea of "shallow copy" has no advantage over a simple pointer, but is much more complex to handle.
What is actually useful is having multiple "lists" but with shared data, where each user has its own "shared copy" of the list that can be used and destroyed independently, but points to the same data, and will only really be deallocated when the last user has destroyed it. This is a very common pattern usually handled by an algorithm called reference counting, and is implemented by many libraries and languages, like Python, glib, and even in C++ as the smart pointer std::shared_ptr.
If you are using C, you may want to add support to reference counting to your struct list, and it is not very difficult: just add a field unsigned reference_count; and set it to 1 when it is allocated. Decrement when destroyed, but only really deallocate if reference_count == 0, in which case there are no more users and you must do a "deep deallocation". You would still have only one destructor function, but two copy constructors:
/** Performs shared copy.
*
* Actually, just increments reference_count and returns the same pointer.
*/
struct list * list_shared_copy_constructor(struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
/** Performs deep copy.
*
* reference_count for the new copy is set to 1.
*/
struct list * list_deep_copy_constructor(const struct list * input)
REQUIRE_RETURNED_VALUE_CAPTURE;
/** Performs destruction. */
void list_shallow_destructor(struct list * input);
If you are actually using C++, as you hinted in your question, then simply use std::shared_ptr.

But, the problem is, I don't recognize shallow destructor as a term in bibliography, so I thought I might be doing something wrong. Am I doing something wrong?
The point of the destructor (or in C just myobject_destroy(myobject)*) is to clean up the resources the instance holds (memory, os handles, ...). Whether you need a "shallow destructor" or a "deep destructor" depends on the way you decided to implement your object, as long as it does its job to cleanup stuff.
If you are using modern C++ stack allocation and smart pointers are your friend, because they manage memory by themselves.

Related

Changing a pointer as a result of destroying an "object" in C

As part of a course I am attending at the moment, we are working in C with self-developed low level libraries, and we are now working in our final project, which is a game.
At a certain point, it seemed relevant to have a struct (serving as a sort of object) that held some important information about the current game status, namely a pointer to a player "object" (can't really call the simulated objects we are using actual objects, can we?).
It would go something like this:
typedef struct {
//Holds relevant information about game current state
state_st currstate;
//Buffer of events to process ('array of events')
//Needs to be pointers because of deallocating memory
event_st ** event_buffer;
//Indicates the size of the event buffer array above
unsigned int n_events_to_process;
//... Other members ...
//Pointer to a player (Pointer to allow allocation and deallocation)
Player * player;
//Flag that indicates if a player has been created
bool player_created;
} Game_Info;
The problem is the following:
If we are to stick to the design philosophy that is used in most of this course, we are to "abstract" these "objects" using functions like Game_Info * create_game_info() and destroy_game_info(Game_Info * gi_ptr) to act as constructors and destructors for these "objects" (also, "member functions" would be something like update_game_state(Game_Info * gi_ptr), acting like C++ by passing the normally implicit this as the first argument).
Therefore, as a way of detecting if the player object inside a Game_Info "instance" had already been deleted I am comparing the player pointer to NULL, since in all of the "destructors", after deallocating the memory I set the passed pointer to NULL, to show that the object was successfully deallocated.
This obviously causes a problem (which I did not detect at first, and thus the player_created bool flag that fixed it while I still was getting a grasp on what was happening) which is that because the pointer is passed by copy and not by reference, it is not set to NULL after the call to the "object" "destructor", and thus comparing it to NULL is not a reliable way to know if the pointer was deallocated.
I am writing this, then, to ask for input on what would be the best way to overcome this problem:
A flag to indicate if an "object" is "instanced" or not - using the flag instead of ptr == NULL in comparisons to assert if the "object" is "instanced" - the solution I am currently using
Passing a pointer to the pointer (calling the functions with &player instead of only player) - would enable setting to NULL
Setting the pointer to NULL one "level" above, after calling the "destructor"
Any other solution, since I am not very experienced in C and am probably overlooking an easier way to solve this problem.
Thank you for reading and for any advice you might be able to provide!
I am writing this, then, to ask for input on what would be the best way to overcome this problem: …
What would be the best way is primarily opinion-based, but of the ways you listed the worst is the first, where one has to keep two variables (pointer and flag) synchronized.
Any other solution…
Another solution would be using a macro, e. g.:
#define destroy_player(p) do { /* whatever cleanup needed */; free(p), p = NULL; } while (0)
…
destroy_player(gi_ptr->player);

What are the internal differences of a T[] and a List<T> in terms of memory?

I was reading an article about array vs list, and the author says that an array is worse than a list, because (among other things) an array is a list of variables, but to me a list is also a list of variables. I mean, I can still do list[3] = new Item().
Actually, I have always somehow saw a List<T> like a wrapper for an array that allows me to use it easily without caring about handling its structure.
What are the internal differences between a T[] and a List<T> in terms of heap/stack memory usage?
Since an array is a static structure, after the initialization, it allocates the memory that you've demanded.
int arr[5];
For example here there are 5 int objects created in memory. But when you use lists, according to its implementation, it gives you first an array with predefined capacity. And while you are adding your elements, if you exceed the capacity then it scales up. In some implementations it just doubles its size, or in some implementations it enlarges itself when the granted capacity is half full.
The author's point about a "list of variables" wasn't about memory. It's that an array contains your internal variables, and returning it allows them to be reassigned by the caller. It comes down to this:
Only pass out an array if it is wrapped up by a read-only object.
If you pass out an internal List<T>, you have the same problem, but here's the key:
We have an extensibility model for lists because lists are classes. We
have no ability to make an “immutable array”. Arrays are what they are
and they’re never going to change.
And, at the time the article was written, the IReadOnlyList interface didn't exist yet (.NET 4.5), though he probably would have mentioned it if it had. I believe he was advocating implementing an IList<T> that would simply throw an exception if you tried to use the setter. Of course, if the user doesn't need the ability to access elements by index, you don't need a list interface at all -- you can just wrap it in a ReadOnlyCollection<T> and return it as an IEnumerable<T>.

Linked List Implementation Options in C

In implementing a single linked list in C, I think there are three ways :
HEADER IS A POINTER ITSELF.IT POINTS TO THE FIRST NODE OF THE LINKED LIST.
1.Declare the header globally and use function void insert(int) to insert.This should work as header is global.
2.Declare header inside main and use function node*insert(node*) to insert.This should work because of the return involved.
3.Declare header inside main and use function void insert(node**) to insert.
Sometimes the second way works even without the return involved. Why?
Which is the better way?
If the functions involved are recursive as in tree which method is appropriate?
You should encapsulate your data structure in a single object (the head node or a struct that contains it), and then you can have your functions work on that object. This means that you can have more than one linked list in your program (that won't work with a global head node) and you can also pass it around to different functions that want to use it (there's no point having a data structure without being able to use it).
If you have your single object (head node) stored in your program then the insert and delete functions don't need to return anything, as you already have a pointer to the object that represents the linked list.
If the functions involved are recursive as in tree which method is appropriate?
The functions should not be recursive "as in tree". The depth of a tree is O(logn), which means recursion is reasonable in many situations; The size of a linked list is O(n), which means recursion can easily overflow the stack.

How to avoid multiple deallocation

A Scene struct has a pointer to (a linked list of) SceneObjects.
Each SceneObject refers to a Mesh.
Some SceneObjects may however refer to the same Mesh (by sharing the same pointer - or handle, see later - to the Mesh). Meshes are pretty big and doing it this way has obvious advantages for rendering speed.
typedef struct {
Mesh *mesh;
...
struct SceneObject *next;
} SceneObject;
typedef struct Scene {
SceneObject *objects;
...
} Scene;
My question:
How do I free a Scene, while avoiding to free the same Mesh pointer multiple times?
I thought I could solve this by using handle to Mesh (Mesh** mesh_handle) instead of a pointer so I could set the referenced Mesh pointer to 0, and let successive frees on it just free 0, but I can't make it work. I just can't get my head around how to avoid multiple deallocations.
Am I forced to keep references for such a scenario? Or am I forced to put all the Mesh objects into a separate Mesh table and free it separately? Is there a way to tackle this without doing these things? By tagging the objects as instances of each other I can naturally adjust the free algorithm so it deals with the problem, but I was wondering if there is a more 'pure' solution for this problem.
One standard solution is to have reference counters, that is every object that can possibly be referred by many other objects must have a counter that remembers how many of them are pointing it. This is done with something like
typedef struct T_Object
{
int refcount;
....
} Object;
Object *newObject(....)
{
Object *obj = my_malloc(sizeof(Object));
obj->refcount = 1;
....
return obj;
}
Object *ref(Object *p)
{
if (p) p->refcount++;
return p;
}
void deref(Object *p)
{
if (p && p->refcount-- == 1)
destroyObject(p);
}
Who first allocates the object will be the first owner (hence the counter is initialized to 1). When you need to store the pointer in other places every time you should store ref(p) instad, to be sure to increment the counter. When someone is not going to point to it anymore you should call deref(p). Once the last reference to the object is gone the counter will become zero and the deref call will actually destroy the object.
It takes some discipline to get it working (you should always think when calling ref and deref) but it's possible to write complex software that has zero leaks using this approach.
A simpler solution that is sometimes applicable is having all your shared objects also stored in a separate list... you freely assign and change complex data structures pointing to these objects but you never free them during the normal use. Only when you need to throw everything away you deallocate those objects by using that separate list.
Note that this approach is possible only if you're not allocating many objects during the "normal use" because in that case delaying the destruction could be not viable.

Indirection from func() to __func()

Quoting a code snippet :
/**
* list_add - add a new entry
* #new: new entry to be added
* #head: list head to add it after
*
* Insert a new entry after the specified head.
* This is good for implementing stacks.
*/
static inline void list_add(struct list_head *new, struct list_head *head)
{
__list_add(new, head, head->next);
}
I have seen similar code in several different programs, especially those manipulating data structures. What is the usual intention in adding this extra level of indirection - why can't the code inside __list_add be put inside list_add ?
If you copy code, it will make maintenance harder. In this example, the extra level of indirection hides the parameter next. It will provide a function with just 2 parameters rather than 3.
If the code inside the __list_add() is copied, it needs to be copied to multiple places. If then the list mechanism is changed somewhat, all of these places need to be updated too, or bugs will start to pop-up (i.e. a FIFO and LIFO implementation of a list show different behavior).
There is always a tradeoff; another level of indirection also adds complexity and possibly overhead, as opposed to duplicating lines of code or having lots of parameters in the API.
It's about code reuse, and avoiding duplication.
__list_add() contains code that is useful in more situations than just this one, and can be shared between several different functions.
Sharing code like this has several advantages:
If there's a bug in __list_add() and you fix it, all the functions that use it get the fix.
If __list_add() gets an enhancement (eg. you make it faster) all the functions get faster.
There's only one place to look when you want to see how items are added to lists.
It can.
However, there are probably other public entries that can share the code in __list_add(). eg, there may be a push() or an insert_head() or something like that.
NOTE: If this is C++ then you might want to rethink calling your variables new, as this is a reserved word.
__list_add will be intended as a "private" internal function which might have multiple uses internally. The public facing list_add is there as a convenient wrapper around it.
This wrapper is inline. If you added the body of __List_add, that too would be inlined. The apaprent goal is to just inline the pushing of the extra head->next argument and nothing else.
That function comes from the linux kernel.
It belongs to the generic list implementation:
Take a look: __list_add()
__list_add() is used in many places: e.g. list_add() which adds an element at a list head, list_add_tail() which adds an element at list tail... It can also be used to insert an element at a given position.
It is also common to define an wrapper function for recursive functions so the initial parameters are set correctly.
See binary search on wikipedia for Recursion (computer science)
Could also be to keep binary compatibility. you have an indirection that allows to keep the ABI invariant.
From http://tldp.org/HOWTO/Program-Library-HOWTO/shared-libraries.html
When a new version of a library is
binary-incompatible with the old one
the soname needs to change. In C,
there are four basic reasons that a
library would cease to be binary
compatible:
The behavior of a function changes so that it no longer meets its
original specification,
Exported data items change (exception: adding optional items to
the ends of structures is okay, as
long as those structures are only
allocated within the library).
An exported function is removed.
The interface of an exported function changes.

Resources