Removing elements from a linked list - c

I'm refactoring some C code and since a while i'm hanging up with a problem related with a linked list data structure. Please have a look at the following simplified snippet:
Link apply(Link first, pred_ptr cond)
{
Link t=first->next,p=first;
do{
if(cond(t))
{
p->next=t->next;
free(t);
t=p;
}
p=t;
t=t->next;
}while(t!=first);
//Check the first
if(cond(first))
{
t=first->next;
free(first);
first=t;
p->next=t;
}
return first;
}
The function apply remove all the elements from the linked list for which the function cond return a non-zero value. Link is something like this:
struct node
{
struct node* next;
//Stuff
};
typedef struct node* Link
Well, the only question i have is about how apply remove the very first element of the linked list -first-, looks like the extra code out of the loop is required in order to valuate the first element, i was not able to put this check inside the loop without extra if statements, perhaps you may know how to remove the extra code from the loop -if possible-, do you?
Thanks,
Have a nice day.

The first element is a special case so it is unsurprising that you will have code for that one that is slightly different from every other case. So your alternatives are (1) the way it is done in your example, (2) putting the special case code in the main loop with an if comparison executed every iteration, or (3) a less understandable version using pointers to pointers.
The first question you should ask yourself is: Why are you refactoring? For clarity or because there is a demonstrated need for a faster implementation? Or ...?

Your code have the logic almost right but have various problems in some places:
Reading the code, I guess that the link list is really a circular link list, because if not, some ->next would be == NULL in some node. And the code don't check for NULL in any place, lines like t=t->next; before the while loop end will execute Undefined Behavior when NULL is reached. Assuming that cond function work OK with NULL and not enter the if statement (if enter, the UB would be executed here p->next=t->next;). If the link list is only a link list and not circular, the code need a lot of refactoring.
The first element in either way linked list or circular linked list could be analyzed separately, in the case of simple linked list it's better this way, because the loop analyzing the rest of the nodes is more readable and free of handling specific cases (would be handled out of the loop, and you won't overload the analyze of the other node with checking for the first node), in the case of circular linked list the first element could be analyzed in the loop elegantly but can be analyzed out the loop too.
By the way that you are already in the refactoring if this are the variable names used in the actual code would be better improved (p -> previous, t -> test? i think, etc...)

Related

Inserting an item to a linked list - is iteration necessary (C)?

I'm currently studying linked lists for interview prep, would appreciate if anybody could shed some light on this. The following function in C supposedly inserts a new element into the list after a certain Element elem (passed as an argument to the function):
bool insertAfter(Element * elem, int data){
Element * newElem, * curPos = head;
newElem->data = data;
while(curPos){
if(curPos == elem){
newElem->next = curPos->next;
curPos->next = newElem;
return true;
}
curPos = curPos->Next;
}
return false;
}
Although the above is specified in the textbook I am studying from, I tried coming up with a solution that does not use any iteration whatsoever:
bool insertAfter(Element * elem, int data){
Element * newElem;
newElem->next = elem->next
newElem->data = data
elem->next = newElem;
return true;
}
However, as it appears too simplistic, I sense that it may not work, but am not sure why. I need some insights on the technicalities on why this may or may not work, thanks.
Both versions suffer from the error of using newElem as though it were a valid pointer. It is not. It is not initialized to point to valid object.
You can correct that by allocating memory for an object before using:
Element * newElem = malloc(sizeof(*newElem));
The difference between the two versions is that if elem is not accessible from head for some reason or it is NULL, the first version will do nothing to the existing list. The second version does not deal with either of those scenarios. It assumes that elem is in the list and that it is not NULL.
Your solution is pretty much correct. The iteration in the example is almost completely useless.
What the example function is doing is: given an element to insert the new data after, it looks through the list starting with head to find that same element, then inserts the data - doing nothing with head or elem after finding it. Since all the loop does is "find" an element to which you already had a pointer, it essentially did nothing at all and is useless.
The only possible use of this is to constrain the insertion function to only work on this one list beginning with head, globally, throughout your program. This is such a strange design decision that one would likely assume it's a mistake unless given a reason to believe otherwise (dynamic data structures constrained to a single instance are an unusual pattern; more importantly, the whole point of linked lists is O(1) insertion, which the example function breaks by adding this useless loop). head is not needed for any other reason than to enforce this constraint, and if this is desired, it would make more sense to pass it in as a parameter as well so that the function is able to be used on more than one list per-program. (Or, not to perform the check at all: another use of linked lists is that you can pass around and insert after nodes without worrying about the head element.)
As other people have pointed out, you fail to actually allocate newElem, but so does the textbook. Overall, it's a rubbish example; not only did the author make a mistake with allocation, but they don't appear to understand the basic advantages of using linked lists. You should definitely treat this textbook with suspicion.
Your logic will work, because you just need a pointer to a node where you want to insert a node. If you already have such a pointer, no need to iterate and search for the node.
However the search (iteration) would be relevant if do not have in hand the node pointer where you want to insert the new node. Example: suppose the nodes have unique keys and you do not have the node pointer where the key exists and you want to insert after you find the node containing the specific key, then you need to find the correct node pointer and do the insertion (The function should then take in key as the argument).
However in your code (both cases), you have not allocated the memory for the new node. You need to do malloc for the new node and then go on with the insertion.
This will not work because you do not know your newElem. In linked list you have knowledge about a head and each element gives you information where to find the next one:
head -> e1 -> e2 -> ...
So you need to iterate till you will find the element you care about. But you can also iterate with the recursion.

Inconsistent double pointer (linked list)

I have a game I am working on that uses a linked list for the entities in the game. I have found what I think to be some sort of bug. Note, I'm coding in C. But after this trouble with C pointers I'm thinking about trying C++ techniques.
In my debug testing two projectiles were colliding which blows both of them up. Basically the situation is this:
Starting in Entity's move function:
1) Projectile entity moves
2) Loop through all entities checking collision at this new location
3) If collision, in this case between projectiles, remove both
I pass a double pointer of the entity to the function that does collision. That entity may be removed but I still need to use it for advancing the entity to the next one in the list (in the while loop). If that didn't make sense it is seen as this:
ENTITY *node;
while (node)
{
...
entity_do_collision (&node); // <-- node may be removed in this function
//Debug
if (node == global_node)
{
}
else
node = global_node;
node = node->next; // <-- pass a double pointer above so this works here
}
So, I've ran through the code so many times and don't see any illegal operations. The part that gets me is sometimes the double pointer will work and sometimes it won't. I tested using a global entity pointer (that always works) to compare back in my entity move function to test if the node being removed matches what it was set to in the entity remove function.
This description is a little abstract, so let me know if I need to explain more.
There are zillion solutions which would or would not work for your exact problem.
Here are some ideas to start with:
Do not delete objects from the container on collision, but mark them "dead". Clean up "dead" bodies in separate pass after collision detection finished.
Mark them dead on collision, but not delete at all. Just reuse marked nodes for future entities.
Improvement of (2): sort your container, so "dead" entities went to the tail and mark size of the container as it would contain only "living" ones
Improvement of (1), (2), (3): implement some kind of "garbage collection", so "dead" entities would be cleaned up let's say once a second or once a frame or when memory threshold reached.
etc.
Sidenote: You should never use linked lists in 21th's century (an era of hierarchies of caches, prefetching, out-of-order execution and mutithreading), unless you really, really have no other choice and you understand what you are doing. Use arrays by default, swith to something else only if you find it reasonable.
More info:
What is “cache-friendly” code?
Stop Using Linked-Lists
Bjarne Stroustrup: Why you should avoid Linked Lists (video)
Original code:
ENTITY *node;
while (node)
{
...
entity_do_collision (&node); // <-- node may be removed in this function
node = node->next;
/* the function can have changed node's value
** but on the next iteration ( on the **value** of node->next)
** the original node->next will not be affected!
*/
}
Sample code using pointer tot pointer:
ENTITY **pp;
for (pp = &global_node; *pp; pp = &(*pp)->next)
{
...
entity_do_collision (pp); // <-- *pp (= node) may be removed in this function
...
}

Linked List insert in C

I have recently started working with linked lists. To push an element into linked list in the insert(...) function, I saw we always check if(head == NULL) but it occurs only once.
I want to know if there is any way so that we can avoid the unnecessary check always. Please suggest something that would be relevant to most of the linked list operations. One solution I figured out is that writing a new function "add_first_element(....)" so that explicitly we add the first element and then other elements are added in a generic way.
I am looking for a better solution.
A common way is to use a sentinel node. That is, a node that contains no useful data, but merely serves as the placeholder for the one before the first node. This way you don't need to check for null.
For double-linked list, you will need two sentinel nodes to avoid null checking.

Too many frees when freeing graph

I have created a graph datastructure using linked lists. With this code:
typedef struct vertexNode *vertexPointer;
typedef struct edgeNode *edgePointer;
void freeGraph(vertexPointer); /* announce function */
struct edgeNode{
vertexPointer connectsTo;
edgePointer next;
};
struct vertexNode{
int vertex;
edgePointer next;
};
Then I create a graph in which I have 4 nodes, lets say A, B, C and D, where:
A connects to D via B and A connects to D via C. With linked lists I imagine it looks like this:
Finally, I try to free the graph with freeGraph(graph).
void freeEdge(edgePointer e){
if (e != NULL) {
freeEdge(e->next);
freeGraph(e->connectsTo);
free(e);
e = NULL;
}
}
void freeGraph(vertexPointer v){
if (v != NULL) {
freeEdge(v->next);
free(v);
v = NULL;
}
}
That's where valgrind starts complaining with "Invalid read of size 4", "Address 0x41fb0d4 is 4 bytes inside a block of size 8 free'd" and "Invalid free()". Also it says it did 8 mallocs and 9 frees.
I think the problem is that the memory for node D is already freed and then I'm trying to free it again. But I don't see any way to do this right without having to alter the datastructure.
What is the best way to prevent these errors and free the graph correctly, if possible without having to alter the datastructure? Also if there are any other problems with this code, please tell. Thanks!
greets,
semicolon
The lack of knowing all the references makes this a bit difficult. A bit of a hack, but faced with the same issue I would likely use a pointer set (a list of unique values, in this case pointers).
Walk the entire graph, pushing nodes pointers into the set only if
not already present (this the definition of 'set')
Walk the set, freeing each pointer (since they're unique, no issue of
a double-free)
Set graph to NULL.
I'm sure there is an elegant recursive solution to this, but faced with the task as stated, this seems doable and not overtly complicated.
Instead of allocating the nodes and edges on a global heap, maybe you can allocate them in a memory pool. To free the graph, free the whole pool.
I would approach this problem by designing a way to first cleanly remove each node from the graph before freeing it. To do this cleanly you will have to figure out what other nodes are referencing the node you are about to delete and remove those edges. Once the edges are removed, if you happen to come around to another node that had previously referenced the deleted node, the edge will already be gone and you won't be able to try to delete it again.
The easiest way would be to modify your data structure to hold a reference to "incoming" edges. That way you could do something like:
v->incoming[i]->next = null; // do this for each edge in incoming
freeEdge(v->next);
free(v);
v = NULL;
If you didn't want to update the data structure you are left with a hard problem of searching your graph for nodes that have edges to the node you want to delete.
It's because you've got two recursions going on here, and they're stepping on each other. freeGraph is called once to free D (say, from B) and then when the initial call to freeGraph comes back from freeEdge, you try freeing v -- which was already taken care of deeper down. That's a poor explanation without an illustration, but there ya go.
You can get rid of one recursions so they're not "crossing over", or you can check before each free to see if that node has already been taken care of by the other branch of the recursion.
Yes, the problem is that D can be reached over two paths and freed twice.
You can do it in 2 phases:
Phase 1: Insert the nodes you reached into a "set" datastructure.
Phase 2: free the nodes in the "set" datastructure.
A possible implementation of that set datastructure, which requires extending your datastructure:
Mark all nodes in the datastructure with a boolean flag, so you don't insert them twice.
Use another "next"-pointer for a second linked list of all the nodes. A simple one.
Another implementation, without extending your datastructure: SOmething like C++ std::set<>
Another problem: Are you sure that all nodes can be reached, when you start from A?
To avoid this problem, insert all nodes into the "set" datastructure at creation time (you won't need the marking, then).

Losing parts in Link List in C

I'm trying to make a link list and I'm having trouble with the concept with linking the middle part, I'm just doing a little pseudo-code right now, haven't actually coded anything.
(struct pointers) *current, *ahead, *behind, *begin;
(behind)-->(current)-->(ahead) //This is what I want to do
behind->next = current;
current->next = ahead;
Is this the proper way to break and connect the list? Without losing anything..
What you have looks correct but rather incomplete. One of the unwritten rules of programming is that you cannot write a linked list implementation correctly the first time. There are four cases you need to deal with:
Insert into an empty list
Insert into a non-empty list
Removing the first element from the list
Removing any other element from the list
There are also doubly-linked lists, where each element has a pointer to both the previous element and the next element. That makes it easier to handle things like removal of a random element without traversing the list, but can be trickier to get right.

Resources