Hashtable separate chaining in C - c

I’m building a hashtable in C using open hashing (separate chaining) to store words. I’m unconcerned by the order in which I will store words with the same hash key.
Currently, I have a pointer to a struct (struct dict * d) with my hashtable (struct item * arr). More specifically, this table is an array of items (struct item) containing a word (char * word) and a pointer (struct item * next).
I’m unclear about two aspects:
1. When chaining words together after collision (inserting new item), should I insert the
element at the beginning or at the end of the linked list?
I’ve seen it done both ways, but the latter seems more popular. However, the former seems quicker to me as I only need to set the pointer of my first item to my new item, and its pointer to null. I don’t have to do any pointer chasing (i.e. travel through my linked list until I find the null pointer).
2. Should my hashtable be an array of pointers to items (struct item), or
simply an array of items (struct item), as I have done?
In other words, should the very first item for a specific hash key be inserted in the first cell (an empty cell), or should there already be a pointer in that cell which we will point to this new item?

For 1. it really shouldn’t matter if you prepend or append to the list. If you keep the load small, the chains are short, and you shouldn’t see any noticeable difference in access performance. If you keep the table small and the load gets high, you might want to look into different strategies. The access pattern might matter then. For example, if you are more likely to look up recently inserted values, you want them early in the list, so then it is better to prepend. But with a hash table, it is better to keep the load small if you can, and then it shouldn’t matter.
For 2. either will also work. If your table is an array of pointers, NULL for empty chains, a simple recursive linked list implementation will work nicely. Make your list functions take a list as an argument and make insert and delete return a new list. Either argument or return value can be NULL. Then do something like tbl[bin] = insert(tbl[bin], val) or tbl[bin] = delete(tbl[bin], val). If the chains are short, you don’t have to worry too much about recursion overhead. In any case, you don’t need recursion for looking up a value or inserting if it is just prepending, so it is only delete where you don’t get tail recursion anyway. The benefit you get from having an array of links is either that you get a dummy element at the front of the list, which simplifies non-recursive list implementations by avoiding special cases for empty lists, or you avoid following a pointer to access the first element in the chain once you look up the bin. For the latter, you need a way to distinguish an empty chain from a chain with one element, though. It is hardly worth it, and if you want to avoid jumping along linked lists, open addressing or some other collision strategy might be better.

Related

Array VS single linked list VS double link list

I am learning about arrays, single linked list and double linked list now a days and this question came that
" What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things "
As far I know array cannot be the answer because it has fixed size. If we want to insert a new thing. it wouldn't always be possible. Double linked list can do the task but there will be two pointers needed for each node so there will be memory problem, so I think single linked list will fulfill all given requirements. Am I right? Please correct me if I am missing any point. There is also one more question that instead of choosing one of them, can I make combination of one or more data structures given here to meet all the requirements?
"What is the best option between these three data structures when it comes to fast searching, less memory, easily insertion and updating of things".
As far as I can tell Arrays serve the purpose.
Fast search: You could do binary search if array is sorted. You dont get that option in linkedlist
Less memory: Arrays will take least memory (but contiguous memory )
Insertion: Inserting in array is a matter of a[i] = "value". If array size is exceeded then simply export data into a new array. That is exactly how HashMaps / ArrayLists work under covers.
Updating things: Only Arrays provide you with Random access. a[i] ="new value".. updated in O(1) time if you know the index.
Each of those has its own benefits and downsides.
For search speed, I'd say arrays are better suitable due to the quick lookup times.
Since an array is a sequence of same-size elements, retrieving the value at an index is just memoryLocation + index * elementSize. For a linked list, the whole list needs traversing.
Arrays also win in the "less memory" category, since there's no need to store extra pointers.
For insertions, arrays are slow. You'll need to traverse the array, copy contents to a new array, assign the new array, delete the old one...
Insertions go much quicker in linked- or double lists, because it's just a matter of changing one or two pointers.
In the end, it all just depends on the use case. Are you inserting a lot? Then you probably want to consider a non-array structure.
Do you need many quick lookups? Consider those arrays again. Etc..
See also this question.
A linked list is usually the best choice when we don’t know in advance the number of elements we will have to store or the number can change dynamically.
Arrays have slow insertion and deletion times. To insert an element to the front or middle of the array, the first step is to ensure that there is space in the array for the new element, otherwise, the array needs to be RESIZED. This is an expensive operation. The next step is to open space for the new element by shifting every element after the desired index. Likewise, for deletion, shifting is required after removing an element. This implies that insertion time for arrays is Big O of n (O(n)) as n elements must be shifted.
Using static arrays, we can save some extra memory in
comparison to linked lists because we do not need to store pointers to the next node
a doubly-linked list support fast insertion/removal at their ends. This is used in LRU cache, where you need to enter new item to front and remove the oldest item from the end.

Implementing arrays using linked lists (and vice versa)

After learning about arrays and linked lists in class, I'm curious about whether arrays can be used to create linked lists and vice versa.
To create a linked list using an array, could I store the first value of the linked list at index 0 of the array, the pointer to the next node at index 1 of the array, and so on? I guess I'm confused because the "pointer to next" seems redundant, given that we know that the index storing the value of the next node will always be: index of value of current node + 2.
I don't think it's possible to create an array using a linked list, because an array involves continuous memory, but a linked list can have nodes stored in different parts of computer memory. Is there some way to get around this?
Thanks in advance.
The array based linked list is generally defined in a 2-dimentional array, something like :
Benefit: The list will only take up to a specific amount of memory that is initially defined.
Down side: The list can only contain a specific predefined amount of items.
As a single linked list the data structure has to hold a head pointer. This data structure holds a head pointer however in this specific implementation it is a int. The head of the list is the pointer that holds the index to the first node. The first node holds the index to the next node and so on. The last node in the list will hold a next value of -1. This will indicate the end of the list. The fact that indices are taken as elements are added into the structure makes a requirement for a free list head. This free list is incorporated into the same 2-dementional array. Just as the head is an int the free list pointer is an int.
Now the data structure is composted of 3 major elements. The head pointer, the free head pointer and the 2-dimentional array. The list has to be initialized correctly to allow the list to be used. The list should be initialized as below.
Reference is this link
You could store a linked list in an array, but only in the sense that you have an ordered list. As you say, you do not need pointers as you know the order (it's explicit in the array ordering). The main differences for choosing between an array or linked list are:
Arrays are "static" in that items are fixed in their elements. You can't mremove an element and have the array automatically shuffle the following elements down. Of course you can bypass "empty" elements in your iteration it this requires specific logic. With a linked list, if you remove an element, it's gone. With an array, you have to shuffle all subsequent elements down.
As such, linked lists are often used where insertion/ deletion of elements is the most common activity. Arrays are most often used with access is required (as faster as directly accessed [by index]).
Another area where you may see benefits of linked lists over arrays is in sorting (where sorting is required or frequent). The reason for this being that linked list sorts require only pointer manipulation whereas aray sorting requires swapping and shuffling. That said, many sorting algorithms create new arrays anyway (merge-sort is typical) which reduces this overhead (though requires the same memory again for the sorted array).
You can mix your metaphors somewhat if, for example, you enable your linked list to be marked "read-only". That is, you could create an array of pointers to each node in your linked list. that way, you can have indexed access to your linked-list. The array becomes outdated (in the way described above) once elements are added or removed from your linked list (hence the read-only aspect).
So, to answer your specific questions:
1) There's no value in doing this - as per the details above
2) You haven't really provided enough information to answer the question of contiguous memory allocation. It depends on a lot: OS, architecture, compiler implementation. You haven't even mentioned the programming language. In short though, choosing between a linked list and array has little to do with contiguous memory allocation and more to do with usage. For instance, the java LinkedList class and ArrayList class both represent a List implementation but are specialised based on usage patterns. It is expected that LinkedList performs better for "lists" expecting high-modification (although in tests done a few years ago this proved to be negligible - I'm not sure of the state in the latest versions of java).
Also, you wouldn't typically "create an array with a linked list" or vice versa. They're both abstract data structures used in building larger components. They represent a list of something in a wider context (e.g. a department has a list of employees). Each datatype just has usage benefits. Similarly, you might use a set, queue, stack, etc. It depends on your usage needs.
I really hope I haven't confused you further!

Linked List - Appending node: loop or pointer?

I am writing a linked list datatype and as such I currently have the standard head pointer which references the first item, and then a next pointer for each element that points to the following one such that the final element has next = NULL.
I am just curious what the pros/cons or best practices are for keeping track of the last node. I could have a 'tail' pointer which always points to the last node making it easy to append, or I could loop over the list starting from the head pointer to find the last node when I want to append. Which method is better?
It is usually a good idea to store the tail. If we think about the complexity of adding an item at the end (if this is an operation you commonly do) it will be O(n) time to search for the tail, or O(1) if you store it.
Another option you can consider is to make your list doubly linked. This way when you want to delete the end of the list, by storing tail you can delete nodes in O(1) time. But this will incur an extra pointer to be stored per element of your list (not expensive, but it adds up, and should be a consideration for memory constrained systems).
In the end, it is all about the operations you need to do. If you never add or delete or operate from the end of your list, there is no reason to do this. I recommend analyzing the complexity of your most common operations and base your decision on that.
Depends on how often you need to find the last node, but in general it is best to have a tail pointer.
There's very little cost to just keeping and updating a tail pointer, but you have to remember to update it! If you can keep it updated, then it will make append operations much faster (O(1) instead of O(n)). So, if you usually add elements to the end of the list, then you should absolutely create and maintain a tail pointer.
If you have a doubly linked list, where every element contains a pointer both to the next and prev elements, then a tail pointer is almost universally used.
On the other hand, if this is a sorted list, then you won't be appending to the end, so the tail pointer would never be used. Still, keeping the pointer around is a good idea, just in case you decide you need it in the future.

Why are linked lists faster than arrays?

I am very puzzled about this. Everywhere there is written "linked lists are faster than arrays" but no one makes the effort to say WHY. Using plain logic I can't understand how a linked list can be faster. In an array all cells are next to each other so as long as you know the size of each cell it's easy to reach one cell instantly. For example if there is a list of 10 integers and I want to get the value in the fourth cell then I just go directly to the start of the array+24 bytes and read 8 bytes from there.
In the other hand when you have a linked list and you want to get the element in the fourth place then you have to start from the beginning or end of the list(depending on if it's a single or double list) and go from one node to the other until you find what you're looking for.
So how the heck can going step by step be faster than going directly to an element?
This question title is misleading.
It asserts that linked lists are faster than arrays without limiting the scope well. There are a number of times when arrays can be significantly faster and there are a number of times when a linked list can be significantly faster: the particular case of linked lists "being faster" does not appear to be supported.
There are two things to consider:
The theoretical bounds of linked-lists vs. arrays in a particular operation; and
the real-world implementation and usage pattern including cache-locality and allocations.
As far as the access of an indexed element: The operation is O(1) in an array and as pointed out, is very fast (just an offset). The operation is O(k) in a linked list (where k is the index and may always be << n, depending) but if the linked list is already being traversed then this is O(1) per step which is "the same" as an array. If an array traversal (for(i=0;i<len;i++) is faster (or slower) depends upon particular implementation/language/run-time.
However, if there is a specific case where the array is not faster for either of the above operations (seek or traversal), it would be interesting to see to be dissected in more detail. (I am sure it is possible to find a language with a very degenerate implementation of arrays over lists cough Haskell cough)
Happy coding.
My simple usage summary: Arrays are good for indexed access and operations which involve swapping elements. The non-amortized re-size operation and extra slack (if required), however, may be rather costly. Linked lists amortize the re-sizing (and trade slack for a "pointer" per-cell) and can often excel at operations like "chopping out or inserting a bunch of elements". In the end they are different data-structures and should be treated as such.
Like most problems in programming, context is everything. You need to think about the expected access patterns of your data, and then design your storage system appropriately. If you insert something once, and then access it 1,000,000 times, then who cares what the insert cost is? On the other hand, if you insert/delete as often as you read, then those costs drive the decision.
Depends on which operation you are referring to. Adding or removing elements is a lot faster in a linked list than in an array.
Iterating sequentially over the list one by one is more or less the same speed in a linked list and an array.
Getting one specific element in the middle is a lot faster in an array.
And the array might waste space, because very often when expanding the array, more elements are allocated than needed at that point in time (think ArrayList in Java).
So you need to choose your data structure depending on what you want to do:
many insertions and iterating sequentially --> use a LinkedList
random access and ideally a predefined size --> use an array
Because no memory is moved when insertion is made in the middle of the array.
For the case you presented, its true - arrays are faster, you need arithmetic only to go from one element to another. Linked list require indirection and fragments memory.
The key is to know what structure to use and when.
Linked lists are preferable over arrays when:
a) you need constant-time insertions/deletions from the list (such as in real-time computing where time predictability is absolutely critical)
b) you don't know how many items will be in the list. With arrays, you may need to re-declare and copy memory if the array grows too big
c) you don't need random access to any elements
d) you want to be able to insert items in the middle of the list (such as a priority queue)
Arrays are preferable when:
a) you need indexed/random access to elements
b) you know the number of elements in the array ahead of time so that you can allocate the correct amount of memory for the array
c) you need speed when iterating through all the elements in sequence. You can use pointer math on the array to access each element, whereas you need to lookup the node based on the pointer for each element in linked list, which may result in page faults which may result in performance hits.
d) memory is a concern. Filled arrays take up less memory than linked lists. Each element in the array is just the data. Each linked list node requires the data as well as one (or more) pointers to the other elements in the linked list.
Array Lists (like those in .Net) give you the benefits of arrays, but dynamically allocate resources for you so that you don't need to worry too much about list size and you can delete items at any index without any effort or re-shuffling elements around. Performance-wise, arraylists are slower than raw arrays.
Reference:
Lamar answer
https://stackoverflow.com/a/393578/6249148
LinkedList is Node-based meaning that data is randomly placed in memory and is linked together by nodes (objects that point to another, rather than being next to one another)
Array is a set of similar data objects stored in sequential memory locations
The advantage of a linked list is that data doesn’t have to be sequential in memory. When you add/remove an element, you are simply changing the pointer of a node to point to a different node, not actually moving elements around. If you don’t have to add elements towards the end of the list, then accessing data is faster, due to iterating over less elements. However there are variations to the LinkedList such as a DoublyLinkedList which point to previous and next nodes.
The advantage of an array is that yes you can access any element O(1) time if you know the index, but if you don’t know the index, then you will have to iterate over the data.
The down side of an array is the fact that its data is stored sequentially in memory. If you want to insert an element at index 1, then you have to move every single element to the right. Also, the array has to keep resizing itself as it grows, basically copying itself in order to make a new array with a larger capacity. If you want to remove an element in the begging, then you will have to move all the elements to left.
Arrays are good when you know the index, but are costly as they grow.
The reason why people talk highly about linked lists is because the most useful and efficient data structures are node based.

Sorting a linked list and returning to original unsorted order

I have an unsorted linked list. I need to sort it by a certain field then return the linked list to its previous unsorted condition. How can I do this without making a copy of the list?
When you say "return the linked list to its previous unsorted condition", do you mean the list needs to be placed into a random order or to the exact same order that you started with?
In any case, don't forget that a list can be linked into more than one list at a time. If you have two sets of "next"/"previous" pointers, then you can effectively have the same set of items sorted two different ways at the same time.
To do this you will need to either sort and then restore the list or create and sort references to the list.
To sort the list directly Merge Sort is most likely the best thing you could use for the initial sort, but returning them to their original state is tricky unless you either record your moves so you can reverse them or store their original position and resort them using that as the key.
If you would rather sort the references to the list instead you will need to allocate enough space to hold pointers to each node and sort that. If you use a flat array to store the pointers then you could use the standard C qsort to do this.
If this is an assignment and you must implement your own sort then if you don't already know the length of the list you could take advantage of having to traverse it to count its length to also choose a good initial pivot point for quicksort or if you choose not to use quicksort you can let your imagination go wild with all kinds of optimizations.
Taking your points in reverse order, to support returning to original order, you can add an extra int field to each list node. Set those values based on the original order, and when you need to return it to the original order, just sort on that field.
As far as the sorting in general goes, you probably want to use something like a merge-sort or possibly a Quick-sort.
You can make that data structure somewhat like this.
struct Elem {
Elem* _next;
Elem* _nextSorted;
...
}
Then you can use any algo for sorting the list (maybe merge sort)
If you want to keep your linked list untouched, you should add information to store the ordered list of elements.
To do so, you can either create a new linked list where each element points to one element of your original linked list. Or you can add one more field in the element of your list like sorted_next.
In any case, you should use a sequential algorithm like mergesort to sort a linked list.
Here is a C source code of mergesort for linked lists that you could reuse for your project.
I guess most of the answers have already covered the usual techniques one could use. As far as figuring out the solution to the problem goes, a trick is to look at the problem and think if the human mind can do it.
Figuring out the original random sequence from a sorted sequence is theoretically impossible unless you use some other means. This can be done by
a)modifying the linked list structure (as mentioned above, you simply add a pointer for the sorted sequence separately). This would work and maybe technically you are not creating a separate linked list, but it is as good as a new linked list - one made of pointers.
b)the other way is to log each transition of the sorting algo in a stack. This allows you to not be dependent on the sorting algorithm you use. For example when say node 1 is shifted to the 3rd position, you could have something like 1:3 pushed to the stack. The notation, of course, may vary. Once you push all the transitions, you can simply pop the stack to give take it back to the original pattern / any point in between. This is more like
If you're interested in learning more about the design for loggers, I suggest you read about the Command Pattern

Resources