I currently have a simple database program that reads keys in from a text file and stores them in a doubly linked list (values are read later if they are required). Currently, I do a sequential search on the list, but that is clearly rather slow. I was hoping that there is another way to do. I was reading about binary trees (in particular, red black trees) but I don't know to much about them, and was hoping that I could gleam something from the stackoverflow hivemind :) I suppose my question is, what is the fastest way to do a search in a doubly linked list?
EDIT: Forgot to say that the list is sorted. Don't know if that changes anything. Also, the reason I only read in keys is that the max value length is 1024*32 bytes, which I feel is too large. Note that this is for an assignment, so "typical usage scenarios" don't apply. The professors are likely going to be stress testing the hell out of this thing, and I don't want to be mallocing blocks that big.
There is a thing called a "skip list" that you could use.
It is a set of ordered lists. Each list skips more of the list items. This lets you do a form of binary search. However, maintaining the lists is more difficult.
The fastest way to do a search in an unsorted doubly-linked list is one element at a time.
If you're trying to make search faster, don't use a linked list. Your idea of using a binary tree, for example, will certainly be faster, but as Matthew Flaschen said in comments, it's a completely different implementation from what you're using now.
Given that your doubly-linked list is sorted, and you have a list of items to search for, I suggest looking into the problem of building a self-balancing binary search tree. The tree construction could take some time, but it will be amortized if you have a long list of items to search for.
Related
As stated in the question header, I need a data structure suitable for fast and efficient searching. The data structure should also be able to add/remove elements to/from anywhere inside the data structure.
Currently I'm using a linked-list. But the problem is that I should walk through the list to find a desired element. General searching algorithms (binary search, jump search and ...) are not directly usable in linked lists, as there is no random access to list elements. Sorting list elements needed in these algorithms is also a problem.
On the other hand, I can't use arrays as it's hard to add/remove an element to/from any desired index.
I've looked for searching algorithms in linked lists and I came to 'skip lists'. Now I'm here to ask if there is a better data structure for my case, or if there is a better search algorithm for linked lists.
I would use AVL binary search tree
For an example of binary search tree you can take a look at https://www.geeksforgeeks.org/avl-tree-set-1-insertion/ and https://www.geeksforgeeks.org/avl-tree-set-2-deletion/
It's well detailed, there is C code and schema.
It's efficient to search in, and It allows you to add and delete values.
It works for both numeric values and some characters implementations (such as dictionnay).
I would like to implement data structure which is able to make fast insertion and keeping data sorted, without duplicates, after every insert.
I thought about binomial heap, but what I understood about that structure is that it can't tell during insertion that particular element is yet in heap. On the another hand there is AVL tree, which fits perfectly for my case, but honestly there are rather too hard for implement for me, at that moment.
So my question is: is there any possiblity to edit binomial heap insertion algorithm to skip duplicates? Maybe anyoune could suggest another structure?
Grettings :)
In C++, there is std::set. it is internally an implementation of red black tree. So it will sort when you enter data.You can have a look into that for a reference.
A good data structure for this is the red-black tree, which is O(log(n)) for insertion. You said you would like to implement a data structure that does this. A good explanation of how to implement that is given here, as well as an open source usable library.
If you're okay using a library you may take a look at libavl Here
The library implements some other varieties of binary trees as well.
Skip lists are also a possibility if you are concerned with thread safety. Balanced binary search trees will perform more poorly than a skip list in such a case as skip lists require no rebalancing, and skip lists are also inherently sorted like a BST. There is a disadvantage in the amount of memory required (since multiple linked lists are technically used), but theoretically speaking it's a good fit.
You can read more about skip lists in this tutorial.
If you have a truly large number of elements, you might also consider just using a doubly-linked list and sorting the list after all items are inserted. This has the benefit of ease of implementation and insertion time.
You would then need to implement a sorting algorithm. A selection sort or insertion sort would be slower but easier to implement than a mergesort, heapsort, or quicksort algorithm. On the other hand, the latter three are not terribly difficult to implement either. The only thing to be careful about is that you don't overflow the stack since those algorithms are typically implemented using recursion. You could create your own stack implementation (not difficult) and implement them iteratively, pushing and popping values onto your stack as necessary. See Iterative quicksort for an example of what I'm referring to.
if you looking for fast insertion and easy implemantaion why not linked list (single or double).
insertion : push head/ push tail - O(1)
remove: pop head/pop tail - O(1)
the only BUT is "find" will be in O(n)
It's my understanding that doubly linked lists use more memory but less CPU and usually provide better algorithm complexity compared to simple linked lists.
What I would like to know is when a simple linked list offers better overall results compared to a doubly linked list. Is there a clear point/situation after which using one instead of the other is unarguably the best solution? (Like after x elements on a regular PC)
My specific problem:
I'm implementing a linked list structure for general use and am thinking whether or not a link back should be included as it would greatly decrease complexity for element removal.
Thanks.
UPDATE:
At what size does element removal become too expensive on a simple linked list?
Choosing a data structure is about weighing costs vs benefits, usually work vs upkeep.
A singly linked list offers easy traversal and easy front insertion. Easy tail insertion can be had at the cost of tracking the last node. That cost means whenever you add/remove a node you have to do an additional check (is this the tail?) and a possible update of the list structure.
A doubly linked list adds a lot more maintenance overhead. Every node now has to store two pointers and those have to be managed and maintained.
If you never need to walk backwards in your list, then a singly linked list is ideal, but not being able to walk backwards means that removal is more expensive.
So you need to start by determining which usage pattern you're going to have. If you are building a one-use list, then a singly linked list may be ideal. If you are building a dynamic list with a high rate of removals, then a doubly linked list will be better suited.
Determining specific costs of operations your data structure offers is the topic of things like 'Big O Notation'.
What I would like to know is when a simple linked list offers better
overall results compared to a doubly linked list.
When you don't have to go backwards.
If you're doing a one way linear search, there's no incentive to traverse the list the other direction, and you wouldn't need those pointers.
UPDATE:
At what size does element removal become too expensive on a simple
linked list?
This problem doesn't have anything to with whether a list is singly or doubly linked. if you have to delete something in a list, you need to look for it, which is time complexity O(n). Having an extra pointer to the previous node doesn't help here.
Linked List is a data structure which can be used to implement a stack or a queue, most of the times. Generally speaking if you are implementing a stack insertion and deletion happen at a single end. if a Q is used usually we delete at one end and add at other end. As you can see both these abstract ds don't need double list and operations add and delete don't depend on number of items.
As mentioned above by Kepani, the only case where you would be worried about the number of elements in list is when you delete in a fashion not described by a stack / Q interface ( a non-linear approach). which would be when the elements are ordered (it can be otherwise of course) and order needs to be maintained.
Using a double list is definitely hard on memory requirement as each node needs to maintain an "extra" pointer, unless you are trying to do maintain a store where you refer to past values as well. Dlist will be handy here (Eg: a cli command line interpreter.)
A single list is hard on time as traversals required to reach the previous value of the current node will depend on the number of elements in the list.
Presuppose there is a linked list with information. My Program architecture rests upon being able to find out if certain information already exists in an individual node in this linked list.
(Ie, see if the child integer value int example already has a value of 5 in some node.)
After that, each node can be systematically operated on.
I can think of various ways to do this, but I was hoping someone more experienced might show something fairly efficient.
Also, is this good practice, or should there be another, more suitable data structure out there?
Thanks!
If O(N) is not enough, sorted array and binary search, or a BST would give you O(log(N)). Alternatively you can look at hashmap data structure. This structure will give you nearly constant time lookup based on a key, but is more complicated than the other options. Problem is that there is no std library implementation of one.
Otherwise searching each element is the best you can hope to do with a linked list.
I have a linked list of around 5000 entries ("NOT" inserted simultaneously), and I am traversing the list, looking for a particular entry on occasions (though this is not very often), should I consider Hash Table as a more optimum choice for this case, replacing the linked list (which is doubly-linked & linear) ?? Using C in Linux.
If you have not found the code to be the slow part of the application via a profiler then you shouldn't do anything about it yet.
If it is slow, but the code is tested, works, and is clear, and there are other slower areas that you can work on speeding up do those first.
If it is buggy then you need to fix it anyways, go for the hash table as it will be faster than the list. This assumes that the order that the data is traversed does not matter, if you care about what the insertion order is then stick with the list (you can do things with a hash table and keep the order, but that will make the code much tricker).
Given that you need to search the list only on occasion the odds of this being a significant bottleneck in your code is small.
Another data structure to look at is a "skip list" which basically lets you skip over a large portion of the list. This requires that the list be sorted however, which, depending on what you are doing, may make the code slower overall.
Whether using hash table is more optimum or not depends on the use case, which you have not described in detail. But more importantly, make sure the bottleneck of performance is in this part of the code. If this code is called only once in a while and not in a critical path, no use bothering to change the code.
Have you measured and found a performance hit with the lookup? A hash_map or hash table should be good.
If you need to traverse the list in order (not as a part of searching for elements, but say for displaying them) then a linked list is a good choice. If you're only storing them so that you can look up elements then a hash table will greatly outperform a linked list (for all but the worst possible hash function).
If your application calls for both types of operations, you might consider keeping both, and using whichever one is appropriate for a particular task. The memory overhead would be small, since you'd only need to keep one copy of each element in memory and have the data structures store pointers to these objects.
As with any optimization step that you take, make sure you measure your code to find the real bottleneck before you make any changes.
If you care about performance, you definitely should. If you're iterating through the thing to find a certain element with any regularity, it's going to be worth it to use a hash table. If it's a rare case, though, and the ordinary use of the list is not a search, then there's no reason to worry about it.
If you only traverse the collection I don't see any advantages of using a hashmap.
I advise against hashes in almost all cases.
There are two reasons; firstly, the size of the hash is fixed.
Second and much more importantly; the hashing algorithm. How do you know you've got it right? how will it behave with real data rather than test data?
I suggest a balanced b-tree. Always O(log n), no uncertainty with regard to a hash algorithm and no size limits.