I would like to implement data structure which is able to make fast insertion and keeping data sorted, without duplicates, after every insert.
I thought about binomial heap, but what I understood about that structure is that it can't tell during insertion that particular element is yet in heap. On the another hand there is AVL tree, which fits perfectly for my case, but honestly there are rather too hard for implement for me, at that moment.
So my question is: is there any possiblity to edit binomial heap insertion algorithm to skip duplicates? Maybe anyoune could suggest another structure?
Grettings :)
In C++, there is std::set. it is internally an implementation of red black tree. So it will sort when you enter data.You can have a look into that for a reference.
A good data structure for this is the red-black tree, which is O(log(n)) for insertion. You said you would like to implement a data structure that does this. A good explanation of how to implement that is given here, as well as an open source usable library.
If you're okay using a library you may take a look at libavl Here
The library implements some other varieties of binary trees as well.
Skip lists are also a possibility if you are concerned with thread safety. Balanced binary search trees will perform more poorly than a skip list in such a case as skip lists require no rebalancing, and skip lists are also inherently sorted like a BST. There is a disadvantage in the amount of memory required (since multiple linked lists are technically used), but theoretically speaking it's a good fit.
You can read more about skip lists in this tutorial.
If you have a truly large number of elements, you might also consider just using a doubly-linked list and sorting the list after all items are inserted. This has the benefit of ease of implementation and insertion time.
You would then need to implement a sorting algorithm. A selection sort or insertion sort would be slower but easier to implement than a mergesort, heapsort, or quicksort algorithm. On the other hand, the latter three are not terribly difficult to implement either. The only thing to be careful about is that you don't overflow the stack since those algorithms are typically implemented using recursion. You could create your own stack implementation (not difficult) and implement them iteratively, pushing and popping values onto your stack as necessary. See Iterative quicksort for an example of what I'm referring to.
if you looking for fast insertion and easy implemantaion why not linked list (single or double).
insertion : push head/ push tail - O(1)
remove: pop head/pop tail - O(1)
the only BUT is "find" will be in O(n)
Related
I'm scanning a large data source, currently about 8 million entries, extracting on string per entry, which I want in alphabetical order.
Currenlty I put them in an array then sort an index to them using qsort() which works fine.
But out of curiosity I'm thinking of instead inserting each string into a data structure that maintains them in alphabetical order as I scan them from the data source, partly for the experience of emlplementing one, partly because it will feel faster without the wait for the sort to complete after the scan has completed (-:
What data structure would be the most straightforward to implement in C?
UPDATE
To clarify, the only operations I need to perform are inserting an item and dumping the index when it's done, by which I mean for each item in the original order dump an integer representing the order it is in after sorting.
SUMMARY
The easiest to implement are binary search trees.
Self balancing binary trees are much better but nontrivial to implement.
Insertion can be done iteratively but in-order traversal for dumping the results and post-order traversal for deleting the tree when done both require either recursion or an explicit stack.
Without implementing balancing, runs of ordered input will result in the degenerate worst case which is a linked list. This means deep trees which severely impact the speed of the insert operation.
Shuffling the input slightly can break up ordered input significantly and is easier to implement that balancing.
Binary search trees. Or self-balancing search trees. But don't expect those to be faster than a properly implemented dynamic array, since arrays have much better locality of reference than pointer structures. Also, unbalanced BSTs may "go linear", so your entire algorithm becomes O(n²), just like quicksort.
You are already using the optimal approach. Sort at the end will be much cheaper than maintaining an online sorted data structure. You can get the same O(logN) with a rb-tree but the constant will be much worse, not to mention significant space overhead.
That said, AVL trees and rb-trees are much simpler to implement if you don't need to support deletion. Left-leaning rb tree can fit in 50 or so lines of code. See http://www.cs.princeton.edu/~rs/talks/LLRB/ (by Sedgewick)
You could implement a faster sorting algorithm such us Timsort or other sorting algorithms with a nlog(n) worst case and just search it using Binary search since its faster if the list is sorted.
you should take a look at Trie datastructure wikilink
i think this will serve what you want
What is the most efficient way to represent two dimensional arrays in prolog? I thought of one long list or list of lists, but they have linear access time which seems to be too slow for my problem. I'm not necessarily looking for a ready solution, but rather a concept how it should be implemented.
You can get logarithmic time access with AVL trees or Red-Black trees, see library(assoc) and library(rbtrees) in SWI and YAP. For constant time access, make a term with N arguments, and use arg/3 for efficient access. Each of these arguments can again be a term with arity N, so you have an array with efficient read-access. Using setarg/3, you can even destructively modify elements, at the cost of losing nice logical properties and much more painful debugging and testing. In many cases, you can reformulate your algorithms to not require random-access, and work with lists of lists. If this is not possible, AVL or other balanced trees are often a very good option.
I am on a system with only about 512kb available to my application (the rest is used for buffers). I need to be as efficient as possible.
I have about 100 items that are rapidly added/deleted from a list. What is an efficient way to store these in C and is there a library (with a good license) that will help? The list never grows above 256 items and its average size is 15 items.
Should I use a Binary Search Tree?
Red Black Tree
With an average size of 15, all these other solutions are unnecessary overkill; a simple dynamic array is best here. Searching is a linear pass over the array and insertion and deletion requires moving all elements behind the insertion point. But still this moving around will be offset by the lack of overhead for so few elements.
Even better, since you’re doing a linear search anyway, deleting at arbitrary points can be done by swapping the last element to the deleted position so no further moving around of elements is required – yielding O(1) for insertion and deletion and O(very small n) for lookup.
If your list is no longer then 256, the best option will be to hold a hash table and add/remove each new element with hash function. this way each add/remove will take you only O(1), and the size of the used memory doesn't need to be large.
I would use a doubly-linked-list. When dealing with tens or hundreds of items, its not terribly slower to search than an array, and it has the advantage of only taking up as much space as it absolutely needs. Adding and removing elements is very simple, and incurs very little additional overhead.
A tree structure is faster for searching, but has more overhead when adding or removing elements. That said, when dealing with tens or hundreds of items, the difference probably isn't significant. If I were you, I'd build an implementation of each and see which one is faster in actual usage.
15 items, BST should be fine if you can keep them sorted, not sure if the overhead will be much better than a linked list or an array if the items are rather small. For a lot of insertions/deletions I recommend a linked list because the only thing you have to do is patch pointers.
What's wrong with a plain old array? You said "list" so presumably order is important, so you can't use a hash set (if you do use a hash set, use probing, not chaining).
You can't use a linked list because it would double your memory requirements. A tree would have the same problem, and it would be much more difficult to implement.
I currently have a simple database program that reads keys in from a text file and stores them in a doubly linked list (values are read later if they are required). Currently, I do a sequential search on the list, but that is clearly rather slow. I was hoping that there is another way to do. I was reading about binary trees (in particular, red black trees) but I don't know to much about them, and was hoping that I could gleam something from the stackoverflow hivemind :) I suppose my question is, what is the fastest way to do a search in a doubly linked list?
EDIT: Forgot to say that the list is sorted. Don't know if that changes anything. Also, the reason I only read in keys is that the max value length is 1024*32 bytes, which I feel is too large. Note that this is for an assignment, so "typical usage scenarios" don't apply. The professors are likely going to be stress testing the hell out of this thing, and I don't want to be mallocing blocks that big.
There is a thing called a "skip list" that you could use.
It is a set of ordered lists. Each list skips more of the list items. This lets you do a form of binary search. However, maintaining the lists is more difficult.
The fastest way to do a search in an unsorted doubly-linked list is one element at a time.
If you're trying to make search faster, don't use a linked list. Your idea of using a binary tree, for example, will certainly be faster, but as Matthew Flaschen said in comments, it's a completely different implementation from what you're using now.
Given that your doubly-linked list is sorted, and you have a list of items to search for, I suggest looking into the problem of building a self-balancing binary search tree. The tree construction could take some time, but it will be amortized if you have a long list of items to search for.
I've implemented Prim's algorithm in C (www.bubblellicious.es/prim.tar.gz) but I was just wondering how to transform this into Kruskal's algorithm.
It seems they're quite similar, but I can't imagine how can I modify my old code into that new one. It'd be delicious if you give some advices or something. I know that's easy, but I'm still a n00b in C programming ...
Why not just write Kruskal's from scratch and see how they compare in your own solutions? Best way to learn.
To convert you need a forest (i.e. a set of trees where initially each node is a tree) as your temporary output structure rather than a single tree. Then on each step, rather than finding the cheapest vertex that adds a currently unconnected node to your tree, you find the cheapest edge in the graph and, if it creates a new tree (i.e. connects two previously unconnected nodes) add that tree to the forest and remove the source trees. Otherwise discard the edge.
A proper implementation of Kruskal is more memory intensive but less time intensive than a proper Prim implementation.
But the differences between the two are quite large. Probably all you can keep between are some helper functions and some data structures. Not a conversion, more a rewrite using more high level building blocks.
Why dont you consider switching to C++ and using the boost graph library
(http://www.boost.org/)?
It contains very well implementations for both algorithms. Type-safe and highly performant.
See kruskal_minimum_spanning_tree and prim_minimum_spanning_tree