How to turn Prim's algorithm into Kruskal's algorithm? - c

I've implemented Prim's algorithm in C (www.bubblellicious.es/prim.tar.gz) but I was just wondering how to transform this into Kruskal's algorithm.
It seems they're quite similar, but I can't imagine how can I modify my old code into that new one. It'd be delicious if you give some advices or something. I know that's easy, but I'm still a n00b in C programming ...

Why not just write Kruskal's from scratch and see how they compare in your own solutions? Best way to learn.

To convert you need a forest (i.e. a set of trees where initially each node is a tree) as your temporary output structure rather than a single tree. Then on each step, rather than finding the cheapest vertex that adds a currently unconnected node to your tree, you find the cheapest edge in the graph and, if it creates a new tree (i.e. connects two previously unconnected nodes) add that tree to the forest and remove the source trees. Otherwise discard the edge.
A proper implementation of Kruskal is more memory intensive but less time intensive than a proper Prim implementation.
But the differences between the two are quite large. Probably all you can keep between are some helper functions and some data structures. Not a conversion, more a rewrite using more high level building blocks.

Why dont you consider switching to C++ and using the boost graph library
(http://www.boost.org/)?
It contains very well implementations for both algorithms. Type-safe and highly performant.
See kruskal_minimum_spanning_tree and prim_minimum_spanning_tree

Related

Why insertion to an immutable trie only slower by a factor of constant time as apposed to using a mutable data structure?

Lee byron makes this point in the video, but I can't seem to find the part where he explains this.
https://www.youtube.com/watch?v=I7IdS-PbEgI&t=1604s
Is this because when you update a node you have traverse log(n) to get to the node. With an immutable structure and it must copy worst-case n nodes... That is as far as I get in my thinking.
If you would attempt to create an immutable list the simple way, the obvious solution would be to copy the whole list into a new list and exchange that single item. So a larger list would take longer to copy, right? The result would at least be O(n).
Immutable.js on the other hand uses a trie (see wikipedia) graph, which allows to reuse most of the structure while making sure, that existing references are not mutated.
Simply said, you create a new tree structure and create new branches for modified parts. When one of the branches is unchanged, the tree can just link to the original structure instead of copying.
The immutable.js documentation starts with two links to long descriptions, especially the one about vector tries is nice:
These data structures are highly efficient on modern JavaScript VMs by
using structural sharing via hash maps tries and vector
tries as popularized by Clojure and Scala, minimizing the need to
copy or cache data.
If you want to know more the details, you might want to take a look on the question about How Immutability is Implemented too.

Artificial Intelligence: Time Complexity of NBA* Search

I am studying informed search algorithms, and for New Bidirectional A* Search, I know that the space complexity is O(b^d), where d is the depth of the shallowest goal node and b is the branch factor. I have tried to find out what its time complexity is, but I haven't been able to find any exact information about it on online resources. Is the exact time complexity of NBA* Search unknown and what is the difference between the original Bidirectional A*? Any insights are appreciated.
If you have specific models of your problem (eg uniformly growing graph in both directions with unit edge costs and the number of states growing exponentially) then most bidirectional search algorithms require O(b^(d/2)) node expansions and require O(b^(d/2)) time. But, this simple model doesn't actual model most real-world problems.
Given this, I would not recommend putting significant effort into studying New Bidirectional A*.
The state of the art in bidirectional search has changed massively in the last few years. The current algorithm with the best theoretical guarantees is NBS - Near-Optimal Bidirectional Search. The algorithm finds optimal paths and is near-optimal in node-expansions. That is, NBS is guaranteed to do no more than 2x more necessary expansions than the best possible algorithm (given reasonable theoretical assumptions such as using the same heuristic). All other algorithms (including A*) can do arbitrarily worse than NBS.
Other algorithm variants of NBS, such as DVCBS, have been proposed which follow the same basic structure, do not have the same guarantees, but perform well in practice.

Why is Goal-directed reasoning and heuristic search hard to combine?

In Artificial Intelligence - A Modern Approach 3rd Edition, I came across an interesting quote stating:
"As yet there is no good understanding of how to combine the two kinds of algorithms [Goal directed reasoning / planning and heuristic search] into a robust and efficient system" (Russel pg 189)
Why is this so? Why is it hard to combine goal oriented planning with a heuristic search? Wouldn't reinforcement learning solve this?
The term “Goal directed reasoning” was used in the 1980s for a backtracking search technique. Sometimes it was called backward reasoning or top-down search, which means all the same. It describes the working of the algorithm in traversing the state space. Or to be more specific: it describes the order in which the states in the graph are visited. In newer literature this aspect of a planner is no longer explained in detail, because a graph search algorithm is no big thing. It means simply to put the nodes on a stack and traverse them.
In contrast, the term “heuristic search” means to replace a brute-force solver with a knowledge based approach. Heuristic search is equal to not traverse a graph, but find a domain-specific strategy which leaves out most part of the graph. And indeed, it is hard to combine backtracking with heuristics, this approach would be called grounding. If a grounded problem is available, it is possible to use a backtracking solver on a knowledge-based problem. This is the strategy utilized in modern PDDL planners which are first describe the domain in a symbolic PDDL notation (which is knowledge based) and using then a fast solver to search in the state space.

Data structure with fast insertion

I would like to implement data structure which is able to make fast insertion and keeping data sorted, without duplicates, after every insert.
I thought about binomial heap, but what I understood about that structure is that it can't tell during insertion that particular element is yet in heap. On the another hand there is AVL tree, which fits perfectly for my case, but honestly there are rather too hard for implement for me, at that moment.
So my question is: is there any possiblity to edit binomial heap insertion algorithm to skip duplicates? Maybe anyoune could suggest another structure?
Grettings :)
In C++, there is std::set. it is internally an implementation of red black tree. So it will sort when you enter data.You can have a look into that for a reference.
A good data structure for this is the red-black tree, which is O(log(n)) for insertion. You said you would like to implement a data structure that does this. A good explanation of how to implement that is given here, as well as an open source usable library.
If you're okay using a library you may take a look at libavl Here
The library implements some other varieties of binary trees as well.
Skip lists are also a possibility if you are concerned with thread safety. Balanced binary search trees will perform more poorly than a skip list in such a case as skip lists require no rebalancing, and skip lists are also inherently sorted like a BST. There is a disadvantage in the amount of memory required (since multiple linked lists are technically used), but theoretically speaking it's a good fit.
You can read more about skip lists in this tutorial.
If you have a truly large number of elements, you might also consider just using a doubly-linked list and sorting the list after all items are inserted. This has the benefit of ease of implementation and insertion time.
You would then need to implement a sorting algorithm. A selection sort or insertion sort would be slower but easier to implement than a mergesort, heapsort, or quicksort algorithm. On the other hand, the latter three are not terribly difficult to implement either. The only thing to be careful about is that you don't overflow the stack since those algorithms are typically implemented using recursion. You could create your own stack implementation (not difficult) and implement them iteratively, pushing and popping values onto your stack as necessary. See Iterative quicksort for an example of what I'm referring to.
if you looking for fast insertion and easy implemantaion why not linked list (single or double).
insertion : push head/ push tail - O(1)
remove: pop head/pop tail - O(1)
the only BUT is "find" will be in O(n)

Easiest to implement online sorted data structure in C

I'm scanning a large data source, currently about 8 million entries, extracting on string per entry, which I want in alphabetical order.
Currenlty I put them in an array then sort an index to them using qsort() which works fine.
But out of curiosity I'm thinking of instead inserting each string into a data structure that maintains them in alphabetical order as I scan them from the data source, partly for the experience of emlplementing one, partly because it will feel faster without the wait for the sort to complete after the scan has completed (-:
What data structure would be the most straightforward to implement in C?
UPDATE
To clarify, the only operations I need to perform are inserting an item and dumping the index when it's done, by which I mean for each item in the original order dump an integer representing the order it is in after sorting.
SUMMARY
The easiest to implement are binary search trees.
Self balancing binary trees are much better but nontrivial to implement.
Insertion can be done iteratively but in-order traversal for dumping the results and post-order traversal for deleting the tree when done both require either recursion or an explicit stack.
Without implementing balancing, runs of ordered input will result in the degenerate worst case which is a linked list. This means deep trees which severely impact the speed of the insert operation.
Shuffling the input slightly can break up ordered input significantly and is easier to implement that balancing.
Binary search trees. Or self-balancing search trees. But don't expect those to be faster than a properly implemented dynamic array, since arrays have much better locality of reference than pointer structures. Also, unbalanced BSTs may "go linear", so your entire algorithm becomes O(n²), just like quicksort.
You are already using the optimal approach. Sort at the end will be much cheaper than maintaining an online sorted data structure. You can get the same O(logN) with a rb-tree but the constant will be much worse, not to mention significant space overhead.
That said, AVL trees and rb-trees are much simpler to implement if you don't need to support deletion. Left-leaning rb tree can fit in 50 or so lines of code. See http://www.cs.princeton.edu/~rs/talks/LLRB/ (by Sedgewick)
You could implement a faster sorting algorithm such us Timsort or other sorting algorithms with a nlog(n) worst case and just search it using Binary search since its faster if the list is sorted.
you should take a look at Trie datastructure wikilink
i think this will serve what you want

Resources