I am writing a linked list datatype and as such I currently have the standard head pointer which references the first item, and then a next pointer for each element that points to the following one such that the final element has next = NULL.
I am just curious what the pros/cons or best practices are for keeping track of the last node. I could have a 'tail' pointer which always points to the last node making it easy to append, or I could loop over the list starting from the head pointer to find the last node when I want to append. Which method is better?
It is usually a good idea to store the tail. If we think about the complexity of adding an item at the end (if this is an operation you commonly do) it will be O(n) time to search for the tail, or O(1) if you store it.
Another option you can consider is to make your list doubly linked. This way when you want to delete the end of the list, by storing tail you can delete nodes in O(1) time. But this will incur an extra pointer to be stored per element of your list (not expensive, but it adds up, and should be a consideration for memory constrained systems).
In the end, it is all about the operations you need to do. If you never add or delete or operate from the end of your list, there is no reason to do this. I recommend analyzing the complexity of your most common operations and base your decision on that.
Depends on how often you need to find the last node, but in general it is best to have a tail pointer.
There's very little cost to just keeping and updating a tail pointer, but you have to remember to update it! If you can keep it updated, then it will make append operations much faster (O(1) instead of O(n)). So, if you usually add elements to the end of the list, then you should absolutely create and maintain a tail pointer.
If you have a doubly linked list, where every element contains a pointer both to the next and prev elements, then a tail pointer is almost universally used.
On the other hand, if this is a sorted list, then you won't be appending to the end, so the tail pointer would never be used. Still, keeping the pointer around is a good idea, just in case you decide you need it in the future.
Related
I’m building a hashtable in C using open hashing (separate chaining) to store words. I’m unconcerned by the order in which I will store words with the same hash key.
Currently, I have a pointer to a struct (struct dict * d) with my hashtable (struct item * arr). More specifically, this table is an array of items (struct item) containing a word (char * word) and a pointer (struct item * next).
I’m unclear about two aspects:
1. When chaining words together after collision (inserting new item), should I insert the
element at the beginning or at the end of the linked list?
I’ve seen it done both ways, but the latter seems more popular. However, the former seems quicker to me as I only need to set the pointer of my first item to my new item, and its pointer to null. I don’t have to do any pointer chasing (i.e. travel through my linked list until I find the null pointer).
2. Should my hashtable be an array of pointers to items (struct item), or
simply an array of items (struct item), as I have done?
In other words, should the very first item for a specific hash key be inserted in the first cell (an empty cell), or should there already be a pointer in that cell which we will point to this new item?
For 1. it really shouldn’t matter if you prepend or append to the list. If you keep the load small, the chains are short, and you shouldn’t see any noticeable difference in access performance. If you keep the table small and the load gets high, you might want to look into different strategies. The access pattern might matter then. For example, if you are more likely to look up recently inserted values, you want them early in the list, so then it is better to prepend. But with a hash table, it is better to keep the load small if you can, and then it shouldn’t matter.
For 2. either will also work. If your table is an array of pointers, NULL for empty chains, a simple recursive linked list implementation will work nicely. Make your list functions take a list as an argument and make insert and delete return a new list. Either argument or return value can be NULL. Then do something like tbl[bin] = insert(tbl[bin], val) or tbl[bin] = delete(tbl[bin], val). If the chains are short, you don’t have to worry too much about recursion overhead. In any case, you don’t need recursion for looking up a value or inserting if it is just prepending, so it is only delete where you don’t get tail recursion anyway. The benefit you get from having an array of links is either that you get a dummy element at the front of the list, which simplifies non-recursive list implementations by avoiding special cases for empty lists, or you avoid following a pointer to access the first element in the chain once you look up the bin. For the latter, you need a way to distinguish an empty chain from a chain with one element, though. It is hardly worth it, and if you want to avoid jumping along linked lists, open addressing or some other collision strategy might be better.
Scala docs says that the performance of tail for an Array sequence is Linear while head performance is Constant. Since the whole block that contains array elements is brought to cache, I don't expect to see any difference between head and tail for an array. I appreciate if someone explains why tail performance for arrays in Scala is linear.
The tail function creates a new array containing all of the elements except the first. To do this we need to create a copy of the array (minus the first element), which is a linear time operation. As the array gets larger there is more to copy.
Use List instead if you require efficient head and tail operations.
You may be confusing tail with last
head gets the first element: O(1) for List and Array
last gets the last element: O(n) for List, O(1) for Array
tail gets everything except the first: O(1) for List, O(n) for Array
init gets everything except the last: O(n) for List and Array
There is a pretty big difference between lists and arrays. head and tail are the canonical interface to lists, which in Scala, are singly linked lists. head refers to the first thing in the list and tail refers to all of the elements after the first. Since linked lists implement the tail as a pointer, this operation is a constant time operation.
However, things are a little different for arrays. Arrays are used for fast random access and refers to a contiguous block of memory. Scala still exposes the list-like interface of head and tail, but it has to do things a little differently to simulate that. In order to simulate tail, it has to make a new array containing all elements except the first. It has to copy all of its values into a new array, which is a linear time operation.
I'm trying to find the point of a singly link list where a loop begins.
what I thought of was taking 2 pointers *slow, *fast one moving with twice the speed of other.
If the list has a loop then at some point
5-6-7-8
| |
1-2-3-4-7-7
slow=fast
Can there be another elegant solution so that the list is traversed only once?
Your idea of using two walkers, one at twice the speed of the other would work, however the more fundamental question this raises is are you picking an appropriate data structure? You should ask yourself if you really need to find the midpoint, and if so, what other structures might be better suited to achieve this in O(1) (constant) time? An array would certainly provide you with much better performance for the midpoint of a collection, but has other operations which are slower. Without knowing the rest of the context I can't make any other suggestion, but I would suggest reviewing your requirements.
I am assuming this was some kind of interview question.
If your list has a loop, then to do it in a single traversal, you will need to mark the nodes as visited as your fast walker goes through the list. When the fast walker encounters NULL or an already visited node, the iteration can end, and your slow walker is at the midpoint.
There are many ways to mark the node as visited, but an external map or set could be used. If you mark the node directly in the node itself, this would necessitate another traversal to clean up the mark.
Edit: So this is not about finding the midpoint, but about loop detection without revisiting already visited nodes. Marking works for that as well. Just traverse the list and mark the nodes. If you hit NULL, no loop. If you hit a visited node, there is a loop. If the mark includes a counter as well, you even know where the loop starts.
I'm assuming that this singly linked list is ending with NULL. In this case, slow pointer and fast pointer will work. Because fast pointer is double at speed of slow one, if fast pointer reaches end of list slow pointer should be at middle of it.
How can I find last node of a circular linked list whose size I don't know and the last node points to any other node except first node of the linked list?
One algorithm that can be used for this is the Floyd cycle algorithm.
Also, see this question.
By definition, if a node does not point to the first node of a circular linked list,
it is not the last node.
Can you elaborate here?
A strange list... why would you ever need something like this? But anyway...
You can simply iterate over all nodes, and stop as soon as the next node would be one you have already visited. The current node will then be your answer.
You need some way to keep track of which nodes have been visited. Add a boolean flag to each node, or use some kind of set data type with fast insertion and lookup (e.g. a hash set).
Maybe add parameter to nodes of the list which tells you if you at end? I think, it wouldn't be problem.
Otherwise, you can remember nodes you already visted. When the next node is already visited, you are at the end.
The Floyd cycle algorithm won't give the last element of the list. It will only tell if there is a cycle or not.
The definition of the last one would be that, while traversing the list in a sequential scan starting from the first one, all elements before it and the last one aren't seen before (pointer value). The after last one will be the first element that has already been seen in this sequential scan.
An easy solution is to flag visited elements so an element already seen is easily detected. The flag may be intrusive, i.e. by changing a bit in the element, or external by using a hash table to store pointer values.
Since we need to be able to test if an element has already been visited, I don't see another solution.
I can elaborate on how to use Floyd's algorithm to solve this problem but I don't understand the explanation for one step
Have 2 pointers traverse the linked list, pointer 1 going at a rate of 1 node per iteration, the second going at a rate of 2 nodes
When the pointers meet, we are in the cycle and we are some distance before pointer 1 has reached the end of the cycle (we know pointer 1 hasn't reached then end because if cycle is distance d and pointer 2 is going at twice the speed of 1, pointer1 will loop the cycle twice before pointer 1 does it once)
So because they have met before pointer 1 fully traversed the cycle, we know that the meeting point is d nodes from the start and k nodes within the cycle (pos = d + k)
If we set pointer 1 to position 0 and start both points again (but at the same rate rate of 1 node per iteration), they will meet at the start of the cycle.
Since we know the start of the cycle, finding the end is trivial
I don't fully understand why step 4 is true but I had a friend explain the solution to me.
I am trying to learn about data structures and algorithms on my own. I wrote my own double-linked list in C and now I want to write some algorithms to perform on the list. What is the preferred way to swap list items? Is it better to swap the content or to rearrange the pointers which point to the next and previous list item?
Rearrange the pointers. Swapping the data items can have side effects. In particular, you may have stored a reference to a node somewhere outside of the function, and usually when you rearrange the order of nodes in a list, you don't want people holding a reference to a node to suddenly find that the node points to new data. That's because generally the important, identifying characteristic of a node is the data it points to not its position in the list.
The canonical swap is done via pointer rearranging, has no side effects and is of course faster:
void swap (node *a, node *b) {
node *tmp;
tmp = a;
a = b;
b = tmp;
}
Depending on the kind of content being stored in the elements of the linked list, swapping the actual content of the element would be tricky (think about a linked list of different length strings for instance) so therefore it's easier to swap the pointers which point to the next and previous list items.
Depends on how you've allocated the content.
If you're storing the pointer to the content, then switching the content isn't a big deal. If you have a big structure that's part of your node, then switching the pointers could be more efficient than copying the whole content.
I tend to side with what most people have already said. A bit of background will probably help: swapping pointers is guaranteed to work whereas swapping objects may not always be as simple as it looks. Think of temporaries that may/will be created and exceptions (and I mean in general and not a C++ language feature way) can occur and actually leave your container (list) in an undesirable state. Look for invariants in your container -- which is that a swap should leave the list size intact as well as the elements intact and design thereon.
Is it better to swap the content or to rearrange the pointers which point to the next and previous list item?
Rearrange the pointers, because it might be much more expensive to swap all the elements of the nodes.
Swapping two pointers is constant operation in terms of Time Complexity.
However, if you'd like to swap all the elements between two nodes, then you would need access them all, and swap them with their counterparts, which is O(f), where f is the number of fields the struct representing the node has (i.e. the number of data a node has).
In other words, swapping node data is a linear operation, while swapping two pointers is constant.
Swapping the data may have side effects, especially when you may have stored a reference to a node somewhere outside of the function, which is already covered in the accepted answer, but I wanted to point out that fact too.