Design of API's for data structures and algorithms

Design of API's for data structures and algorithms - c

I recently implemented binary search tree, linked lists etc as a learning exercise. II implemented several API's like Insert,delete etc.
For example the Insert node API looks like
void insertNode(Node** root, Node* node)
The application would allocate memory for the node to be inserted. i.e node and assign the value/key and pass to this function.
1) My question is whether this is right approach in general? Or does the application only need to pass the value/key to insertNode function and the this function allocates memory?
i.e void insertNode(Node** root, int key)
{
malloc for node here
}
2) What is a good design practice- the application handles allocating memory and free or this library of APi's ?
Thanks

General principles:
Whoever allocates memory should also have responsibility to delete it.
Make the client's job as easy as possible
Think about thread safety
So, don't have client allocate the memory being managed by the tree. However we then need to think carefully about what a find or search should return.
You might look at example code to see what policies other folk have taken for collection APIs. For example.
In passing you don't need the double pointer for your root.
If someone has root pointing to an object { left -> ..., right -> ...} then if you pass root as Node* your code can write
root->left = newValue;
you're modifying what root points to not root itself. By passing the double pointer you're allowing the structure to be completely relocated because someone can write:
*root = completelyNewTree
which I doubt you intend.
Now for inserting I'd suggest passing in the new value, but have the collection allocate space and copy it. The collection now has complete control over the data it holds, and anything it returns should be copied. We don't want clients messing with the tree directly (thread safety).
This implies that for result of a find either the caller must pre-allocate a buffer or we must clearly document that the collection is returning an allocated copy that the caller must delete.
And yes, working in any Garbage Collecting language is much easier!

Related

Deleting from a BST in C

I am very new to C programming and am using it for one of my classes. The project I am working on involves a BST and deleting nodes from it.
In terms of memory and allocation, if all the nodes in the tree are created using malloc() function, is it enough to call free on a particular node to delete it? Or do I have to set the pointer from the parent to NULL as well?

A pointer which points to nothing will by default have a value of NULL, so I don't see why you'd need to manually set it. It wouldn't hurt anything, but freeing the memory would be sufficient.
You can always write a quick test to make sure that the value of the now non-existent child is in fact NULL... something like this (not tested, just a general idea):
int checkNull(Node* x){
return(x->child == NULL ? 1 : 0);
}
If checkNull() returns 1, you're good. If not, maybe you do have to manually reset it after all. Hope this helps.

Inserting an item to a linked list - is iteration necessary (C)?

I'm currently studying linked lists for interview prep, would appreciate if anybody could shed some light on this. The following function in C supposedly inserts a new element into the list after a certain Element elem (passed as an argument to the function):
bool insertAfter(Element * elem, int data){
Element * newElem, * curPos = head;
newElem->data = data;
while(curPos){
if(curPos == elem){
newElem->next = curPos->next;
curPos->next = newElem;
return true;
}
curPos = curPos->Next;
}
return false;
}
Although the above is specified in the textbook I am studying from, I tried coming up with a solution that does not use any iteration whatsoever:
bool insertAfter(Element * elem, int data){
Element * newElem;
newElem->next = elem->next
newElem->data = data
elem->next = newElem;
return true;
}
However, as it appears too simplistic, I sense that it may not work, but am not sure why. I need some insights on the technicalities on why this may or may not work, thanks.

Both versions suffer from the error of using newElem as though it were a valid pointer. It is not. It is not initialized to point to valid object.
You can correct that by allocating memory for an object before using:
Element * newElem = malloc(sizeof(*newElem));
The difference between the two versions is that if elem is not accessible from head for some reason or it is NULL, the first version will do nothing to the existing list. The second version does not deal with either of those scenarios. It assumes that elem is in the list and that it is not NULL.

Your solution is pretty much correct. The iteration in the example is almost completely useless.
What the example function is doing is: given an element to insert the new data after, it looks through the list starting with head to find that same element, then inserts the data - doing nothing with head or elem after finding it. Since all the loop does is "find" an element to which you already had a pointer, it essentially did nothing at all and is useless.
The only possible use of this is to constrain the insertion function to only work on this one list beginning with head, globally, throughout your program. This is such a strange design decision that one would likely assume it's a mistake unless given a reason to believe otherwise (dynamic data structures constrained to a single instance are an unusual pattern; more importantly, the whole point of linked lists is O(1) insertion, which the example function breaks by adding this useless loop). head is not needed for any other reason than to enforce this constraint, and if this is desired, it would make more sense to pass it in as a parameter as well so that the function is able to be used on more than one list per-program. (Or, not to perform the check at all: another use of linked lists is that you can pass around and insert after nodes without worrying about the head element.)
As other people have pointed out, you fail to actually allocate newElem, but so does the textbook. Overall, it's a rubbish example; not only did the author make a mistake with allocation, but they don't appear to understand the basic advantages of using linked lists. You should definitely treat this textbook with suspicion.

Your logic will work, because you just need a pointer to a node where you want to insert a node. If you already have such a pointer, no need to iterate and search for the node.
However the search (iteration) would be relevant if do not have in hand the node pointer where you want to insert the new node. Example: suppose the nodes have unique keys and you do not have the node pointer where the key exists and you want to insert after you find the node containing the specific key, then you need to find the correct node pointer and do the insertion (The function should then take in key as the argument).
However in your code (both cases), you have not allocated the memory for the new node. You need to do malloc for the new node and then go on with the insertion.

This will not work because you do not know your newElem. In linked list you have knowledge about a head and each element gives you information where to find the next one:
head -> e1 -> e2 -> ...
So you need to iterate till you will find the element you care about. But you can also iterate with the recursion.

Create a new copy of a data structure based on pointers

I started with a programming assignment. I had to design a DFA based on graphs. Here is the data structure I used for it:
typedef struct n{
struct n *next[255]; //pointer to the next state. Value is NULL if no transition for the input character( denoted by their ascii value)
bool start_state;
bool end_state;
}node;
Now I have a DFA graph-based structure ready with me. I need to utilize this DFA in several places; The DFA will get modified in each of these several places. But I want unmodified DFAs to be passed to these various functions. One way is to create a copy of this DFA. What is the most elegant way of doing this? So all of them are initialized either with a NULL value or some pointer to another state.
NOTE:
I want the copy to be created in the called function i.e. I pass the DFA, the called function creates its copy and does operation on it. This way, my original DFA remains undeterred.
MORE NOTES:
From each node of a DFA, I can have a directed edge connecting it with another edge, If the transition takes place when there the input alphabet is c then state->next[c] will have a pointer of the next node. It is possible that several elements of the next array are NULL. Modifying the NFA means both adding new nodes as well as altering the present nodes.

If you need a private copy on each call, and since this is a linked data structure, I see no way to avoid copying the whole graph (except perhaps to do a copy-on-write to some sub branches if the performance is that critical, but the complexity is significant and so is the chance of bugs).
Had this been c++, you could have done this in a copy constructor, but in c you just need to clone on every function. One way is to clone the entire structure (Like Mark suggested) - it's pretty complicated since you need to track cycles/ back edges in the graph (which manifest as pointers to previously visited nodes that you don't want to reallocate but reuse what you've already allocated).
Another way, if you're willing to change your data structure, is to work with arrays - keep all the nodes in a single array of type node. The array should be big enough to accommodate all nodes if you know the limit, or just reallocate it to increase upon demand, and each "pointer" is replaced by a simple index.
Building this array is different - instead of mallocing a new node, use the next available index (keep it on the side), or if you're going to add/remove nodes on the fly, you could keep a queue/stack of "free" indices (populate at the beginning with 1..N, and pop/push there whenever you need a new location or about to free an old one.
The upside is that copying would be much faster, since all the links are relative to the instance of the array, you just copy a chunk of contiguous memory (memcpy would now work fine)
Another upside is that the performance of using this data structure should be superior to the linked one, since the memory accesses are spatially close and easily prefetchable.

You'll need to write a recursive function that visits all the nodes, with a global dictionary that keeps track of the mapping from the source graph nodes to the copied graph nodes. This dictionary will be basically a table that maps old pointers to new pointers.
Here's the idea. I haven't compiled it or debugged it...
struct {
node* existing;
node* copy
} dictionary[MAX_NODES] = {0};
node* do_copy(node* existing)
{
node* copy;
int i;
for(i=0;dictionary[i].existing;i++) {
if (dictionary[i].existing == existing) return dictionary[i].copy;
}
copy = (node*)malloc(sizeof(node));
dictionary[i].existing = existing;
dictionary[i].copy = copy;
for(int j=0;j<255 && existing->next[j];j++) {
node* child = do_copy(existing->next[j]);
copy->next[j] = child;
}
copy->end_state = existing->end_state;
copy->start_start = existing->start_state;
return copy;
}

Too many frees when freeing graph

I have created a graph datastructure using linked lists. With this code:
typedef struct vertexNode *vertexPointer;
typedef struct edgeNode *edgePointer;
void freeGraph(vertexPointer); /* announce function */
struct edgeNode{
vertexPointer connectsTo;
edgePointer next;
};
struct vertexNode{
int vertex;
edgePointer next;
};
Then I create a graph in which I have 4 nodes, lets say A, B, C and D, where:
A connects to D via B and A connects to D via C. With linked lists I imagine it looks like this:
Finally, I try to free the graph with freeGraph(graph).
void freeEdge(edgePointer e){
if (e != NULL) {
freeEdge(e->next);
freeGraph(e->connectsTo);
free(e);
e = NULL;
}
}
void freeGraph(vertexPointer v){
if (v != NULL) {
freeEdge(v->next);
free(v);
v = NULL;
}
}
That's where valgrind starts complaining with "Invalid read of size 4", "Address 0x41fb0d4 is 4 bytes inside a block of size 8 free'd" and "Invalid free()". Also it says it did 8 mallocs and 9 frees.
I think the problem is that the memory for node D is already freed and then I'm trying to free it again. But I don't see any way to do this right without having to alter the datastructure.
What is the best way to prevent these errors and free the graph correctly, if possible without having to alter the datastructure? Also if there are any other problems with this code, please tell. Thanks!
greets,
semicolon

The lack of knowing all the references makes this a bit difficult. A bit of a hack, but faced with the same issue I would likely use a pointer set (a list of unique values, in this case pointers).
Walk the entire graph, pushing nodes pointers into the set only if
not already present (this the definition of 'set')
Walk the set, freeing each pointer (since they're unique, no issue of
a double-free)
Set graph to NULL.
I'm sure there is an elegant recursive solution to this, but faced with the task as stated, this seems doable and not overtly complicated.

Instead of allocating the nodes and edges on a global heap, maybe you can allocate them in a memory pool. To free the graph, free the whole pool.

I would approach this problem by designing a way to first cleanly remove each node from the graph before freeing it. To do this cleanly you will have to figure out what other nodes are referencing the node you are about to delete and remove those edges. Once the edges are removed, if you happen to come around to another node that had previously referenced the deleted node, the edge will already be gone and you won't be able to try to delete it again.
The easiest way would be to modify your data structure to hold a reference to "incoming" edges. That way you could do something like:
v->incoming[i]->next = null; // do this for each edge in incoming
freeEdge(v->next);
free(v);
v = NULL;
If you didn't want to update the data structure you are left with a hard problem of searching your graph for nodes that have edges to the node you want to delete.

It's because you've got two recursions going on here, and they're stepping on each other. freeGraph is called once to free D (say, from B) and then when the initial call to freeGraph comes back from freeEdge, you try freeing v -- which was already taken care of deeper down. That's a poor explanation without an illustration, but there ya go.
You can get rid of one recursions so they're not "crossing over", or you can check before each free to see if that node has already been taken care of by the other branch of the recursion.

Yes, the problem is that D can be reached over two paths and freed twice.
You can do it in 2 phases:
Phase 1: Insert the nodes you reached into a "set" datastructure.
Phase 2: free the nodes in the "set" datastructure.
A possible implementation of that set datastructure, which requires extending your datastructure:
Mark all nodes in the datastructure with a boolean flag, so you don't insert them twice.
Use another "next"-pointer for a second linked list of all the nodes. A simple one.
Another implementation, without extending your datastructure: SOmething like C++ std::set<>
Another problem: Are you sure that all nodes can be reached, when you start from A?
To avoid this problem, insert all nodes into the "set" datastructure at creation time (you won't need the marking, then).

How to avoid multiple deallocation

A Scene struct has a pointer to (a linked list of) SceneObjects.
Each SceneObject refers to a Mesh.
Some SceneObjects may however refer to the same Mesh (by sharing the same pointer - or handle, see later - to the Mesh). Meshes are pretty big and doing it this way has obvious advantages for rendering speed.
typedef struct {
Mesh *mesh;
...
struct SceneObject *next;
} SceneObject;
typedef struct Scene {
SceneObject *objects;
...
} Scene;
My question:
How do I free a Scene, while avoiding to free the same Mesh pointer multiple times?
I thought I could solve this by using handle to Mesh (Mesh** mesh_handle) instead of a pointer so I could set the referenced Mesh pointer to 0, and let successive frees on it just free 0, but I can't make it work. I just can't get my head around how to avoid multiple deallocations.
Am I forced to keep references for such a scenario? Or am I forced to put all the Mesh objects into a separate Mesh table and free it separately? Is there a way to tackle this without doing these things? By tagging the objects as instances of each other I can naturally adjust the free algorithm so it deals with the problem, but I was wondering if there is a more 'pure' solution for this problem.

One standard solution is to have reference counters, that is every object that can possibly be referred by many other objects must have a counter that remembers how many of them are pointing it. This is done with something like
typedef struct T_Object
{
int refcount;
....
} Object;
Object *newObject(....)
{
Object *obj = my_malloc(sizeof(Object));
obj->refcount = 1;
....
return obj;
}
Object *ref(Object *p)
{
if (p) p->refcount++;
return p;
}
void deref(Object *p)
{
if (p && p->refcount-- == 1)
destroyObject(p);
}
Who first allocates the object will be the first owner (hence the counter is initialized to 1). When you need to store the pointer in other places every time you should store ref(p) instad, to be sure to increment the counter. When someone is not going to point to it anymore you should call deref(p). Once the last reference to the object is gone the counter will become zero and the deref call will actually destroy the object.
It takes some discipline to get it working (you should always think when calling ref and deref) but it's possible to write complex software that has zero leaks using this approach.
A simpler solution that is sometimes applicable is having all your shared objects also stored in a separate list... you freely assign and change complex data structures pointing to these objects but you never free them during the normal use. Only when you need to throw everything away you deallocate those objects by using that separate list.
Note that this approach is possible only if you're not allocating many objects during the "normal use" because in that case delaying the destruction could be not viable.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

Design of API's for data structures and algorithms - c

Related

Deleting from a BST in C

Inserting an item to a linked list - is iteration necessary (C)?

Create a new copy of a data structure based on pointers

Too many frees when freeing graph

How to avoid multiple deallocation

Categories

Resources