Create a new copy of a data structure based on pointers - c

I started with a programming assignment. I had to design a DFA based on graphs. Here is the data structure I used for it:
typedef struct n{
struct n *next[255]; //pointer to the next state. Value is NULL if no transition for the input character( denoted by their ascii value)
bool start_state;
bool end_state;
}node;
Now I have a DFA graph-based structure ready with me. I need to utilize this DFA in several places; The DFA will get modified in each of these several places. But I want unmodified DFAs to be passed to these various functions. One way is to create a copy of this DFA. What is the most elegant way of doing this? So all of them are initialized either with a NULL value or some pointer to another state.
NOTE:
I want the copy to be created in the called function i.e. I pass the DFA, the called function creates its copy and does operation on it. This way, my original DFA remains undeterred.
MORE NOTES:
From each node of a DFA, I can have a directed edge connecting it with another edge, If the transition takes place when there the input alphabet is c then state->next[c] will have a pointer of the next node. It is possible that several elements of the next array are NULL. Modifying the NFA means both adding new nodes as well as altering the present nodes.

If you need a private copy on each call, and since this is a linked data structure, I see no way to avoid copying the whole graph (except perhaps to do a copy-on-write to some sub branches if the performance is that critical, but the complexity is significant and so is the chance of bugs).
Had this been c++, you could have done this in a copy constructor, but in c you just need to clone on every function. One way is to clone the entire structure (Like Mark suggested) - it's pretty complicated since you need to track cycles/ back edges in the graph (which manifest as pointers to previously visited nodes that you don't want to reallocate but reuse what you've already allocated).
Another way, if you're willing to change your data structure, is to work with arrays - keep all the nodes in a single array of type node. The array should be big enough to accommodate all nodes if you know the limit, or just reallocate it to increase upon demand, and each "pointer" is replaced by a simple index.
Building this array is different - instead of mallocing a new node, use the next available index (keep it on the side), or if you're going to add/remove nodes on the fly, you could keep a queue/stack of "free" indices (populate at the beginning with 1..N, and pop/push there whenever you need a new location or about to free an old one.
The upside is that copying would be much faster, since all the links are relative to the instance of the array, you just copy a chunk of contiguous memory (memcpy would now work fine)
Another upside is that the performance of using this data structure should be superior to the linked one, since the memory accesses are spatially close and easily prefetchable.

You'll need to write a recursive function that visits all the nodes, with a global dictionary that keeps track of the mapping from the source graph nodes to the copied graph nodes. This dictionary will be basically a table that maps old pointers to new pointers.
Here's the idea. I haven't compiled it or debugged it...
struct {
node* existing;
node* copy
} dictionary[MAX_NODES] = {0};
node* do_copy(node* existing)
{
node* copy;
int i;
for(i=0;dictionary[i].existing;i++) {
if (dictionary[i].existing == existing) return dictionary[i].copy;
}
copy = (node*)malloc(sizeof(node));
dictionary[i].existing = existing;
dictionary[i].copy = copy;
for(int j=0;j<255 && existing->next[j];j++) {
node* child = do_copy(existing->next[j]);
copy->next[j] = child;
}
copy->end_state = existing->end_state;
copy->start_start = existing->start_state;
return copy;
}

Related

Inserting an item to a linked list - is iteration necessary (C)?

I'm currently studying linked lists for interview prep, would appreciate if anybody could shed some light on this. The following function in C supposedly inserts a new element into the list after a certain Element elem (passed as an argument to the function):
bool insertAfter(Element * elem, int data){
Element * newElem, * curPos = head;
newElem->data = data;
while(curPos){
if(curPos == elem){
newElem->next = curPos->next;
curPos->next = newElem;
return true;
}
curPos = curPos->Next;
}
return false;
}
Although the above is specified in the textbook I am studying from, I tried coming up with a solution that does not use any iteration whatsoever:
bool insertAfter(Element * elem, int data){
Element * newElem;
newElem->next = elem->next
newElem->data = data
elem->next = newElem;
return true;
}
However, as it appears too simplistic, I sense that it may not work, but am not sure why. I need some insights on the technicalities on why this may or may not work, thanks.
Both versions suffer from the error of using newElem as though it were a valid pointer. It is not. It is not initialized to point to valid object.
You can correct that by allocating memory for an object before using:
Element * newElem = malloc(sizeof(*newElem));
The difference between the two versions is that if elem is not accessible from head for some reason or it is NULL, the first version will do nothing to the existing list. The second version does not deal with either of those scenarios. It assumes that elem is in the list and that it is not NULL.
Your solution is pretty much correct. The iteration in the example is almost completely useless.
What the example function is doing is: given an element to insert the new data after, it looks through the list starting with head to find that same element, then inserts the data - doing nothing with head or elem after finding it. Since all the loop does is "find" an element to which you already had a pointer, it essentially did nothing at all and is useless.
The only possible use of this is to constrain the insertion function to only work on this one list beginning with head, globally, throughout your program. This is such a strange design decision that one would likely assume it's a mistake unless given a reason to believe otherwise (dynamic data structures constrained to a single instance are an unusual pattern; more importantly, the whole point of linked lists is O(1) insertion, which the example function breaks by adding this useless loop). head is not needed for any other reason than to enforce this constraint, and if this is desired, it would make more sense to pass it in as a parameter as well so that the function is able to be used on more than one list per-program. (Or, not to perform the check at all: another use of linked lists is that you can pass around and insert after nodes without worrying about the head element.)
As other people have pointed out, you fail to actually allocate newElem, but so does the textbook. Overall, it's a rubbish example; not only did the author make a mistake with allocation, but they don't appear to understand the basic advantages of using linked lists. You should definitely treat this textbook with suspicion.
Your logic will work, because you just need a pointer to a node where you want to insert a node. If you already have such a pointer, no need to iterate and search for the node.
However the search (iteration) would be relevant if do not have in hand the node pointer where you want to insert the new node. Example: suppose the nodes have unique keys and you do not have the node pointer where the key exists and you want to insert after you find the node containing the specific key, then you need to find the correct node pointer and do the insertion (The function should then take in key as the argument).
However in your code (both cases), you have not allocated the memory for the new node. You need to do malloc for the new node and then go on with the insertion.
This will not work because you do not know your newElem. In linked list you have knowledge about a head and each element gives you information where to find the next one:
head -> e1 -> e2 -> ...
So you need to iterate till you will find the element you care about. But you can also iterate with the recursion.

Design of API's for data structures and algorithms

I recently implemented binary search tree, linked lists etc as a learning exercise. II implemented several API's like Insert,delete etc.
For example the Insert node API looks like
void insertNode(Node** root, Node* node)
The application would allocate memory for the node to be inserted. i.e node and assign the value/key and pass to this function.
1) My question is whether this is right approach in general? Or does the application only need to pass the value/key to insertNode function and the this function allocates memory?
i.e void insertNode(Node** root, int key)
{
malloc for node here
}
2) What is a good design practice- the application handles allocating memory and free or this library of APi's ?
Thanks
General principles:
Whoever allocates memory should also have responsibility to delete it.
Make the client's job as easy as possible
Think about thread safety
So, don't have client allocate the memory being managed by the tree. However we then need to think carefully about what a find or search should return.
You might look at example code to see what policies other folk have taken for collection APIs. For example.
In passing you don't need the double pointer for your root.
If someone has root pointing to an object { left -> ..., right -> ...} then if you pass root as Node* your code can write
root->left = newValue;
you're modifying what root points to not root itself. By passing the double pointer you're allowing the structure to be completely relocated because someone can write:
*root = completelyNewTree
which I doubt you intend.
Now for inserting I'd suggest passing in the new value, but have the collection allocate space and copy it. The collection now has complete control over the data it holds, and anything it returns should be copied. We don't want clients messing with the tree directly (thread safety).
This implies that for result of a find either the caller must pre-allocate a buffer or we must clearly document that the collection is returning an allocated copy that the caller must delete.
And yes, working in any Garbage Collecting language is much easier!

Why do we need pointers in C implementation of a linked list?

Why is it important to use pointers in an implementation of linked lists in C?
For example:
typedef struct item
{
type data;
struct item *next;
} Item;
typedef struct list
Item *head;
} List;
What would happen if Ill use the same implementation just without the pointers?
Well you'll end up with something like this
typedef struct item
{
type data;
struct item next;
} Item;
and now the C compiler will go to try to figure out how large Item is. But since next is embedded right in Item, it'll end up with an equation like this
size-of-Item = size-of-type + size-of-Item
which is infinite. Hence we have a problem. Because of this, C requires pointers so you have
size-of-Item = size-of-type + size-of-pointer
which is closed. More interestingly, even when you do this in languages like Java, Python, or Haskell, you're really implicitly storing a pointer (they say reference) to break the cycle. They just hide the fact from you.
You do not actually need to use pointers to implement a data structure that exposes the interface of a linked list, but you do need them in order to provide the performance characteristics of a linked list.
What would the alternatives to pointers be? Well, item needs to have some way to refer to the next item. It cannot aggregate the next item for various reasons, one of them being that this would make the struct "recursive" and therefore not realizable:
// does not compile -- an item has an item has an item has an item has...
typedef struct item
{
type data;
struct item next;
} Item;
What you could do is have an array of items and implement the linked list on top of that:
Item *storage[100];
typedef struct item
{
type data;
int next; // points to the index inside storage of the next item
} Item;
This looks like it would work; you can have a function that adds items to the list and another that removes them:
void add_after(Item *new, Item *existing);
void remove(Item *existing);
These functions would either remove existing from the array (taking care to update the next "pointer" of the previous item) creating an empty slot, or find the existing item and insert new in an empty slot in storage (updating next to point there).
The problem with this is that this makes the add_after and remove operations not realizable in constant time because you now need to search into the array and reallocate it whenever you are out of space.
Since constant time add/remove is the reason people use a linked list for, this makes the non-pointer approach only useful as an intellectual exercise. For a real list, using pointers means you can perform these operations in constant time.
Pointers are used for dynamic allocation of memory. You can implement a list as a simple array but that means that you allocate a continuous memory area, which for a large list is not efficient. Also, arrays are more difficult cu extend (you have to reallocate it if you reach its maximum size).
Therefore, dynamic allocation is preferred for large lists, and for each element it can allocate it into memory in any place, continuously or not, and you can extend it easily.
Also, you cannot store in an struct item another struct item because the structure is not defined yet and it cannot establish the size of the structure:
This cannot be done:
typedef struct item
{
type data;
struct item next;
} Item;
Therefore pointers are used because the size of a pointer can be determined (the size of an integer on that platform) when the structure is compiled.
Without a pointer, each list item contains another list item (next) which contains another list item. Which in turn contains another list item. And so on.
You end up with an infinitely large data structure that would use an infinite amount of memory, and that of course can't work.
You don't need a pointer to create a list in C. If a pointer list is preferable depends on your application. People used linked lists before they had a concept of pointers implemented in languages. It's only sometimes simpler to comprehend for some people (with pointers). You don't need a structure with data either.
A simple array will do fine. You need:
one integer array to hold indices (this resembles the pointer). Array content is either -1 (means nil or NULL in pointer terminology), or 0 .. N-1, means: the array index of the next element linked. The index-array's index represents the item 's index in the data array.
one array for any data to be 'connected' within the 'list' above.
That's all you need.
An array instead of a pointer list may have
advantages, if the amount of your data doesn't change (doesn't grow). In this case, the advantages can be huge. If you know the maximum number of data you'll encounter and can allocate data beforehand, an array implementation will be much faster than any pointer-linked list. Especially, if you only insert, but don't need to remove.
disadvantages otherwise. In this case, the disadvantages can be huge.
Please read up here.
Your example would, therefore, look like:
type data[N]; // or: type *data = new type [N];
int links[N]; // or: int *links = new int [N];
That should be all you need to start with a simple test case.
Important of pointer in c
advantages:-
Execution speed will be high.
pointers allow us to access the variable which is outside of a function.
reduce the size of our code.
without pointers:-
Code size will get increase.
We cannot maintain the data efficiently.
In list the pointer is used to point the next item from the list.If you are not declared the pointers means you cannot access more than one item from the list.
If you allocate some memory dynamically at run time that time really the pointers are much needed.
There's more than one kind of "list". The implementation you show is a "Linked List". Without pointers there would be no "links", so we'd need to look at other kinds of lists.
So, first up, what happens if you just remove the * from the struct definition: it won't compile. You can include a struct inside of another struct, but if there's recursion then the structure will have infinite size that isn't going to happen!
How about an array as a list:
struct item list[100];
Yep, that'll work, but now your list is a fixed size.
Ok, so let's dynamically allocate the list:
struct item *list = malloc(sizeof(struct item));
Then, everytime you add to the list you'd have to do this:
list = realloc(list, newcount * sizeof(struct item));
list[newitem] = blah;
OK, that works, but now we're reallocating memory frequently, and that leads to lots of memory copies, and inefficiency.
On the other hand, if the list is updated very infrequently, this is more space-efficient, and that might be a good thing.
Another disadvantage of using an array is that operations such as push, pop, sort, reverse, etc. become much more expensive; they all mean multiple memory copies.
There are other kinds of "list", but they all involve pointers, so I think we can disregard them here.

Too many frees when freeing graph

I have created a graph datastructure using linked lists. With this code:
typedef struct vertexNode *vertexPointer;
typedef struct edgeNode *edgePointer;
void freeGraph(vertexPointer); /* announce function */
struct edgeNode{
vertexPointer connectsTo;
edgePointer next;
};
struct vertexNode{
int vertex;
edgePointer next;
};
Then I create a graph in which I have 4 nodes, lets say A, B, C and D, where:
A connects to D via B and A connects to D via C. With linked lists I imagine it looks like this:
Finally, I try to free the graph with freeGraph(graph).
void freeEdge(edgePointer e){
if (e != NULL) {
freeEdge(e->next);
freeGraph(e->connectsTo);
free(e);
e = NULL;
}
}
void freeGraph(vertexPointer v){
if (v != NULL) {
freeEdge(v->next);
free(v);
v = NULL;
}
}
That's where valgrind starts complaining with "Invalid read of size 4", "Address 0x41fb0d4 is 4 bytes inside a block of size 8 free'd" and "Invalid free()". Also it says it did 8 mallocs and 9 frees.
I think the problem is that the memory for node D is already freed and then I'm trying to free it again. But I don't see any way to do this right without having to alter the datastructure.
What is the best way to prevent these errors and free the graph correctly, if possible without having to alter the datastructure? Also if there are any other problems with this code, please tell. Thanks!
greets,
semicolon
The lack of knowing all the references makes this a bit difficult. A bit of a hack, but faced with the same issue I would likely use a pointer set (a list of unique values, in this case pointers).
Walk the entire graph, pushing nodes pointers into the set only if
not already present (this the definition of 'set')
Walk the set, freeing each pointer (since they're unique, no issue of
a double-free)
Set graph to NULL.
I'm sure there is an elegant recursive solution to this, but faced with the task as stated, this seems doable and not overtly complicated.
Instead of allocating the nodes and edges on a global heap, maybe you can allocate them in a memory pool. To free the graph, free the whole pool.
I would approach this problem by designing a way to first cleanly remove each node from the graph before freeing it. To do this cleanly you will have to figure out what other nodes are referencing the node you are about to delete and remove those edges. Once the edges are removed, if you happen to come around to another node that had previously referenced the deleted node, the edge will already be gone and you won't be able to try to delete it again.
The easiest way would be to modify your data structure to hold a reference to "incoming" edges. That way you could do something like:
v->incoming[i]->next = null; // do this for each edge in incoming
freeEdge(v->next);
free(v);
v = NULL;
If you didn't want to update the data structure you are left with a hard problem of searching your graph for nodes that have edges to the node you want to delete.
It's because you've got two recursions going on here, and they're stepping on each other. freeGraph is called once to free D (say, from B) and then when the initial call to freeGraph comes back from freeEdge, you try freeing v -- which was already taken care of deeper down. That's a poor explanation without an illustration, but there ya go.
You can get rid of one recursions so they're not "crossing over", or you can check before each free to see if that node has already been taken care of by the other branch of the recursion.
Yes, the problem is that D can be reached over two paths and freed twice.
You can do it in 2 phases:
Phase 1: Insert the nodes you reached into a "set" datastructure.
Phase 2: free the nodes in the "set" datastructure.
A possible implementation of that set datastructure, which requires extending your datastructure:
Mark all nodes in the datastructure with a boolean flag, so you don't insert them twice.
Use another "next"-pointer for a second linked list of all the nodes. A simple one.
Another implementation, without extending your datastructure: SOmething like C++ std::set<>
Another problem: Are you sure that all nodes can be reached, when you start from A?
To avoid this problem, insert all nodes into the "set" datastructure at creation time (you won't need the marking, then).

How to avoid multiple deallocation

A Scene struct has a pointer to (a linked list of) SceneObjects.
Each SceneObject refers to a Mesh.
Some SceneObjects may however refer to the same Mesh (by sharing the same pointer - or handle, see later - to the Mesh). Meshes are pretty big and doing it this way has obvious advantages for rendering speed.
typedef struct {
Mesh *mesh;
...
struct SceneObject *next;
} SceneObject;
typedef struct Scene {
SceneObject *objects;
...
} Scene;
My question:
How do I free a Scene, while avoiding to free the same Mesh pointer multiple times?
I thought I could solve this by using handle to Mesh (Mesh** mesh_handle) instead of a pointer so I could set the referenced Mesh pointer to 0, and let successive frees on it just free 0, but I can't make it work. I just can't get my head around how to avoid multiple deallocations.
Am I forced to keep references for such a scenario? Or am I forced to put all the Mesh objects into a separate Mesh table and free it separately? Is there a way to tackle this without doing these things? By tagging the objects as instances of each other I can naturally adjust the free algorithm so it deals with the problem, but I was wondering if there is a more 'pure' solution for this problem.
One standard solution is to have reference counters, that is every object that can possibly be referred by many other objects must have a counter that remembers how many of them are pointing it. This is done with something like
typedef struct T_Object
{
int refcount;
....
} Object;
Object *newObject(....)
{
Object *obj = my_malloc(sizeof(Object));
obj->refcount = 1;
....
return obj;
}
Object *ref(Object *p)
{
if (p) p->refcount++;
return p;
}
void deref(Object *p)
{
if (p && p->refcount-- == 1)
destroyObject(p);
}
Who first allocates the object will be the first owner (hence the counter is initialized to 1). When you need to store the pointer in other places every time you should store ref(p) instad, to be sure to increment the counter. When someone is not going to point to it anymore you should call deref(p). Once the last reference to the object is gone the counter will become zero and the deref call will actually destroy the object.
It takes some discipline to get it working (you should always think when calling ref and deref) but it's possible to write complex software that has zero leaks using this approach.
A simpler solution that is sometimes applicable is having all your shared objects also stored in a separate list... you freely assign and change complex data structures pointing to these objects but you never free them during the normal use. Only when you need to throw everything away you deallocate those objects by using that separate list.
Note that this approach is possible only if you're not allocating many objects during the "normal use" because in that case delaying the destruction could be not viable.

Resources