I have a list of cities and an energy expenditure between each of them. I want to find the "best" (shortest) path between some specific pairs, that leads to the least energy loss. All roads are two-way roads, same energy expenditure from one city to another in a pair, but in the "best" path, each city should be visited only once to prevent looping around the same city.
I've tried making a directed adjacency list graph and using Bellman Ford but i am indeed detecting negative cycles, making the shortest path non-existent, but can't figure out how to make the algorithm go through each node only once to prevent looping. I've thought about using something like BFS to just print all the possible paths from a source node to a destination and somehow skip going through the same node (summing up the weights afterwards perhaps). Any ideas of how I would possibly solve this, either modding the Bellman Ford or the BFS or using something else?
The problem of finding the lowest-cost simple path (a path that doesn’t repeat nodes) in a graph containing negative cycles is NP-hard, which means that at present we don’t have any efficient algorithms for the problem and it’s possible that none exists. (You can prove NP-hardness by a reduction from the Hamiltonian path problem: if you assign each edge in a graph cost -1, then there’s a Hamiltonian path from a node u to a node v if and only if the lowest-cost simple path between them has length n-1, where n is the number of nodes in the graph.)
There are some techniques to find short paths using at most a fixed number of hops. Check out the “Color Coding” algorithm for an example of one of these.
add the absolute value of the most negative weight to EVERY weight.
Apply the Dijkstra algorithm. https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
Subtract absolute value of the most negative weight times number of hops from cost of optimal path found in 2
I have some doubts regarding best first search algorithm. The pseudocode that I have is the following:
best first search pseudocode
First doubt: is it complete? I have read that it is not because it can enter in a dead end, but I don't know when can happen, because if the algorithm chooses a node that has not more neighbours it does not get stucked in it because this node is remove from the open list and in the next iteration the following node of the open list is treated and the search continues.
Second doubt: is it optimal? I thought that if it is visiting the nodes closer to the goal along the search process, then the solution would be the shortest, but it is not in that way and I do not know the reason for that and therefore, the reason that makes this algorithm not optimal.
The heuristic I was using is the straight line distance between two points.
Thanks for your help!!
Of course, if heuristic function underestimates the costs, best first search is not optimal. In fact, even if your heuristic function is exactly right, best first search is never guaranteed to be optimal. Here is a counter example. Consider the following graph:
The green numbers are the actual costs and the red numbers are the exact heuristic function. Let's try to find a path from node S to node G.
Best first search would give you S->A->G following the heuristic function. However, if you look at the graph closer, you would see that the path S->B->C->G has lower cost of 5 instead of 6. Thus, this is an example of best first search performing suboptimal under perfect heuristic function.
In general case best first search algorithm is complete as in worst case scenario it will search the whole space (worst option). Now, it should be also optimal - given the heuristic function is admissible - meaning it does not overestimate the cost of the path from any of the nodes to goal. (It also needs to be consistent - that means that it adheres to triangle inequality, if it is not then the algorithm would not be complete - as it could enter a cycle)
Checking your algorithm I do not see how the heuristic function is calculated. Also I do not see there is calculated the cost of the path to get to the particular node.
So, it needs to calculate the actual cost of the path to reach a particular node and then it needs to add a heuristics estimate of the cost of the path from the node towards goal.
The formula is f(n)=g(n)+h(n) where g(n) is the cost of the path to reach the node and h(n) is the heuristics estimating the cost of the cheapest path from n to the goal.
Check the implementation of A* algorithm which is an example of best first search on path planning.
TLDR In best first search, you need to calculate the cost of a node as a sum of the cost of the path to get to that node and the heuristic function that estimate the cost of the path from that node to the goal. If the heuristic function will be admissible and consistent the algorithm will be optimal and complete.
I was recently studying Uninformed Search from here. In the case of Depth First Search, it is given that the space taken by the fringe is O(b.m), but I am unable to figure out how(I could not find proof of this anywhere online). Any help or pointers to specific material would be much appreciated.
A depth-first tree search needs to store only a single path from the root to a leaf node, along with the remaining unexpanded sibling nodes for each node on the path.
Once a node has been expanded, it can be removed from memory as soon as all its descendants has been fully expanded.
So that, for a state space with branching factor band maximum depth m, depth-first search requires storage of only O(b*m).
ref. Russel & Norvig, Artificial Intelligence, Figure 3.16, p87
The Depth First Search (DFS) algorithm has to store few nodes in the fridge because it processes the lastly added node first (Last In First Out), which results in a space complexity of O(bd). Thus, for a depth of d it has to store at most the b children of the d nodes above.
The Breadth First Search (BFS) algorithm, however, gets the firstly inserted node first (First In First Out). Because of this, it has to keep track of all the children nodes it encounters, which results in a space complexity of O(b^d).
Thus, for a depth of d it has to store the children and the children's children, etc., resulting in the exponential growth.
What is the exact difference between Dijkstra's and Prim's algorithms? I know Prim's will give a MST but the tree generated by Dijkstra will also be a MST. Then what is the exact difference?
Prim's algorithm constructs a minimum spanning tree for the graph, which is a tree that connects all nodes in the graph and has the least total cost among all trees that connect all the nodes. However, the length of a path between any two nodes in the MST might not be the shortest path between those two nodes in the original graph. MSTs are useful, for example, if you wanted to physically wire up the nodes in the graph to provide electricity to them at the least total cost. It doesn't matter that the path length between two nodes might not be optimal, since all you care about is the fact that they're connected.
Dijkstra's algorithm constructs a shortest path tree starting from some source node. A shortest path tree is a tree that connects all nodes in the graph back to the source node and has the property that the length of any path from the source node to any other node in the graph is minimized. This is useful, for example, if you wanted to build a road network that made it as efficient as possible for everyone to get to some major important landmark. However, the shortest path tree is not guaranteed to be a minimum spanning tree, and the sum of the costs on the edges of a shortest-path tree can be much larger than the cost of an MST.
Another important difference concerns what types of graphs the algorithms work on. Prim's algorithm works on undirected graphs only, since the concept of an MST assumes that graphs are inherently undirected. (There is something called a "minimum spanning arborescence" for directed graphs, but algorithms to find them are much more complicated). Dijkstra's algorithm will work fine on directed graphs, since shortest path trees can indeed be directed. Additionally, Dijkstra's algorithm does not necessarily yield the correct solution in graphs containing negative edge weights, while Prim's algorithm can handle this.
Dijkstra's algorithm doesn't create a MST, it finds the shortest path.
Consider this graph
5 5
s *-----*-----* t
\ /
-------
9
The shortest path is 9, while the MST is a different 'path' at 10.
Prim and Dijkstra algorithms are almost the same, except for the "relax function".
Prim:
MST-PRIM (G, w, r) {
for each key ∈ G.V
u.key = ∞
u.parent = NIL
r.key = 0
Q = G.V
while (Q ≠ ø)
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q)
alt = w(u,v) <== relax function, Pay attention here
if alt < v.key
v.parent = u
v.key = alt
}
Dijkstra:
Dijkstra (G, w, r) {
for each key ∈ G.V
u.key = ∞
u.parent = NIL
r.key = 0
Q = G.V
while (Q ≠ ø)
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q)
alt = w(u,v) + u.key <== relax function, Pay attention here
if alt < v.key
v.parent = u
v.key = alt
}
The only difference is pointed out by the arrow, which is the relax function.
The Prim, which searches for the minimum spanning tree, only cares about the minimum of the total edges cover all the vertices. The relax function is alt = w(u,v)
The Dijkstra, which searches for the minimum path length, so it cares about the edge accumulation. The relax function is alt = w(u,v) + u.key
Dijsktra's algorithm finds the minimum distance from node i to all nodes (you specify i). So in return you get the minimum distance tree from node i.
Prims algorithm gets you the minimum spaning tree for a given graph. A tree that connects all nodes while the sum of all costs is the minimum possible.
So with Dijkstra you can go from the selected node to any other with the minimum cost, you don't get this with Prim's
The only difference I see is that Prim's algorithm stores a minimum cost edge whereas Dijkstra's algorithm stores the total cost from a source vertex to the current vertex.
Dijkstra gives you a way from the source node to the destination node such that the cost is minimum. However Prim's algorithm gives you a minimum spanning tree such that all nodes are connected and the total cost is minimum.
In simple words:
So, if you want to deploy a train to connecte several cities, you would use Prim's algo. But if you want to go from one city to other saving as much time as possible, you'd use Dijkstra's algo.
Both can be implemented using exactly same generic algorithm as follows:
Inputs:
G: Graph
s: Starting vertex (any for Prim, source for Dijkstra)
f: a function that takes vertices u and v, returns a number
Generic(G, s, f)
Q = Enqueue all V with key = infinity, parent = null
s.key = 0
While Q is not empty
u = dequeue Q
For each v in adj(u)
if v is in Q and v.key > f(u,v)
v.key = f(u,v)
v.parent = u
For Prim, pass f = w(u, v) and for Dijkstra pass f = u.key + w(u, v).
Another interesting thing is that above Generic can also implement Breadth First Search (BFS) although it would be overkill because expensive priority queue is not really required. To turn above Generic algorithm in to BFS, pass f = u.key + 1 which is same as enforcing all weights to 1 (i.e. BFS gives minimum number of edges required to traverse from point A to B).
Intuition
Here's one good way to think about above generic algorithm: We start with two buckets A and B. Initially, put all your vertices in B so the bucket A is empty. Then we move one vertex from B to A. Now look at all the edges from vertices in A that crosses over to the vertices in B. We chose the one edge using some criteria from these cross-over edges and move corresponding vertex from B to A. Repeat this process until B is empty.
A brute force way to implement this idea would be to maintain a priority queue of the edges for the vertices in A that crosses over to B. Obviously that would be troublesome if graph was not sparse. So question would be can we instead maintain priority queue of vertices? This in fact we can as our decision finally is which vertex to pick from B.
Historical Context
It's interesting that the generic version of the technique behind both algorithms is conceptually as old as 1930 even when electronic computers weren't around.
The story starts with Otakar Borůvka who needed an algorithm for a family friend trying to figure out how to connect cities in the country of Moravia (now part of the Czech Republic) with minimal cost electric lines. He published his algorithm in 1926 in a mathematics related journal, as Computer Science didn't existed then. This came to the attention to Vojtěch Jarník who thought of an improvement on Borůvka's algorithm and published it in 1930. He in fact discovered the same algorithm that we now know as Prim's algorithm who re-discovered it in 1957.
Independent of all these, in 1956 Dijkstra needed to write a program to demonstrate the capabilities of a new computer his institute had developed. He thought it would be cool to have computer find connections to travel between two cities of the Netherlands. He designed the algorithm in 20 minutes. He created a graph of 64 cities with some simplifications (because his computer was 6-bit) and wrote code for this 1956 computer. However he didn't published his algorithm because primarily there were no computer science journals and he thought this may not be very important. The next year he learned about the problem of connecting terminals of new computers such that the length of wires was minimized. He thought about this problem and re-discovered Jarník/Prim's algorithm which again uses the same technique as the shortest path algorithm he had discovered a year before. He mentioned that both of his algorithms were designed without using pen or paper. In 1959 he published both algorithms in a paper that is just 2 and a half page long.
Dijkstra finds the shortest path between it's beginning node
and every other node. So in return you get the minimum distance tree from beginning node i.e. you can reach every other node as efficiently as possible.
Prims algorithm gets you the MST for a given graph i.e. a tree that connects all nodes while the sum of all costs is the minimum possible.
To make a story short with a realistic example:
Dijkstra wants to know the shortest path to each destination point by saving traveling time and fuel.
Prim wants to know how to efficiently deploy a train rail system i.e. saving material costs.
Directly from Dijkstra's Algorithm's wikipedia article:
The process that underlies Dijkstra's algorithm is similar to the greedy process used in Prim's algorithm. Prim's purpose is to find a minimum spanning tree that connects all nodes in the graph; Dijkstra is concerned with only two nodes. Prim's does not evaluate the total weight of the path from the starting node, only the individual path.
Here's what clicked for me: think about which vertex the algorithm takes next:
Prim's algorithm takes next the vertex that's closest to the tree, i.e. closest to some vertex anywhere on the tree.
Dijkstra's algorithm takes next the vertex that is closest to the source.
Source: R. Sedgewick's lecture on Dijkstra's algorithm, Algorithms, Part II: https://coursera.org/share/a551af98e24292b6445c82a2a5f16b18
I was bothered with the same question lately, and I think I might share my understanding...
I think the key difference between these two algorithms (Dijkstra and Prim) roots in the problem they are designed to solve, namely, shortest path between two nodes and minimal spanning tree (MST). The formal is to find the shortest path between say, node s and t, and a rational requirement is to visit each edge of the graph at most once. However, it does NOT require us to visit all the node. The latter (MST) is to get us visit ALL the node (at most once), and with the same rational requirement of visiting each edge at most once too.
That being said, Dijkstra allows us to "take shortcut" so long I can get from s to t, without worrying the consequence - once I get to t, I am done! Although there is also a path from s to t in the MST, but this s-t path is created with considerations of all the rest nodes, therefore, this path can be longer than the s-t path found by the Dijstra's algorithm. Below is a quick example with 3 nodes:
2 2
(s) o ----- o ----- o (t)
| |
-----------------
3
Let's say each of the top edges has the cost of 2, and the bottom edge has cost of 3, then Dijktra will tell us to the take the bottom path, since we don't care about the middle node. On the other hand, Prim will return us a MST with the top 2 edges, discarding the bottom edge.
Such difference is also reflected from the subtle difference in the implementations: in Dijkstra's algorithm, one needs to have a book keeping step (for every node) to update the shortest path from s, after absorbing a new node, whereas in Prim's algorithm, there is no such need.
The simplest explanation is in Prims you don't specify the Starting Node, but in dijsktra you (Need to have a starting node) have to find shortest path from the given node to all other nodes.
The key difference between the basic algorithms lies in their different edge-selection criteria. Generally, they both use a priority queue for selecting next nodes, but have different criteria to select the adjacent nodes of current processing nodes: Prim's Algorithm requires the next adjacent nodes must be also kept in the queue, while Dijkstra's Algorithm does not:
def dijkstra(g, s):
q <- make_priority_queue(VERTEX.distance)
for each vertex v in g.vertex:
v.distance <- infinite
v.predecessor ~> nil
q.add(v)
s.distance <- 0
while not q.is_empty:
u <- q.extract_min()
for each adjacent vertex v of u:
...
def prim(g, s):
q <- make_priority_queue(VERTEX.distance)
for each vertex v in g.vertex:
v.distance <- infinite
v.predecessor ~> nil
q.add(v)
s.distance <- 0
while not q.is_empty:
u <- q.extract_min()
for each adjacent vertex v of u:
if v in q and weight(u, v) < v.distance:// <-------selection--------
...
The calculations of vertex.distance are the second different point.
Dijkstras algorithm is used only to find shortest path.
In Minimum Spanning tree(Prim's or Kruskal's algorithm) you get minimum egdes with minimum edge value.
For example:- Consider a situation where you wan't to create a huge network for which u will be requiring a large number of wires so these counting of wire can be done using Minimum Spanning Tree(Prim's or Kruskal's algorithm) (i.e it will give you minimum number of wires to create huge wired network connection with minimum cost).
Whereas "Dijkstras algorithm" will be used to get the shortest path between two nodes while connecting any nodes with each other.
Dijkstra's algorithm is a single source shortest path problem between node i and j, but Prim's algorithm a minimal spanning tree problem. These algorithm use programming concept named 'greedy algorithm'
If you check these notion, please visit
Greedy algorithm lecture note : http://jeffe.cs.illinois.edu/teaching/algorithms/notes/07-greedy.pdf
Minimum spanning tree : http://jeffe.cs.illinois.edu/teaching/algorithms/notes/20-mst.pdf
Single source shortest path : http://jeffe.cs.illinois.edu/teaching/algorithms/notes/21-sssp.pdf
#templatetypedef has covered difference between MST and shortest path. I've covered the algorithm difference in another So answer by demonstrating that both can be implemented using same generic algorithm that takes one more parameter as input: function f(u,v). The difference between Prim and Dijkstra's algorithm is simply which f(u,v) you use.
At the code level, the other difference is the API.
You initialize Prim with a source vertex, s, i.e., Prim.new(s); s can be any vertex, and regardless of s, the end result, which are the edges of the minimum spanning tree (MST) are the same. To get the MST edges, we call the method edges().
You initialize Dijkstra with a source vertex, s, i.e., Dijkstra.new(s) that you want to get shortest path/distance to all other vertices. The end results, which are the shortest path/distance from s to all other vertices; are different depending on the s. To get the shortest paths/distances from s to any vertex, v, we call the methods distanceTo(v) and pathTo(v) respectively.
They both create trees with the greedy method.
With Prim's algorithm we find minimum cost spanning tree. The goal is to find minimum cost to cover all nodes.
with Dijkstra we find Single Source Shortest Path. The goal is find the shortest path from the source to every other node
Prim’s algorithm works exactly as Dijkstra’s, except
It does not keep track of the distance from the source.
Storing the edge that connected the front of the visited vertices to the next closest vertex.
The vertex used as “source” for Prim’s algorithm is
going to be the root of the MST.
After implementing most of the common and needed functions for my Graph implementation, I realized that a couple of functions (remove vertex, search vertex and get vertex) don't have the "best" implementation.
I'm using adjacency lists with linked lists for my Graph implementation and I was searching one vertex after the other until it finds the one I want. Like I said, I realized I was not using the "best" implementation. I can have 10000 vertices and need to search for the last one, but that vertex could have a link to the first one, which would speed up things considerably. But that's just an hypothetical case, it may or may not happen.
So, what algorithm do you recommend for search lookup? Our teachers talked about Breadth-first and Depth-first mostly (and Dikjstra' algorithm, but that's a completely different subject). Between those two, which one do you recommend?
It would be perfect if I could implement both but I don't have time for that, I need to pick up one and implement it has the first phase deadline is approaching...
My guess, is to go with Depth-first, seems easier to implement and looking at the way they work, it seems a best bet. But that really depends on the input.
But what do you guys suggest?
If you’ve got an adjacency list, searching for a vertex simply means traversing that list. You could perhaps even order the list to decrease the needed lookup operations.
A graph traversal (such as DFS or BFS) won’t improve this from a performance point of view.
Finding and deleting nodes in a graph is a "search" problem not a graph problem, so to make it better than O(n) = linear search, BFS, DFS, you need to store your nodes in a different data structure optimized for searching or sort them. This gives you O(log n) for find and delete operations. Candidatas are tree structures like b-trees or hash tables. If you want to code the stuff yourself I would go for a hash table which normally gives very good performance and is reasonably easy to implement.
I think BFS would usually be faster an average. Read the wiki pages for DFS and BFS.
The reason I say BFS is faster is because it has the property of reaching nodes in order of their distance from your starting node. So if your graph has N nodes and you want to search for node N and node 1, which is the node you start your search form, is linked to N, then you will find it immediately. DFS might expand the whole graph before this happens however. DFS will only be faster if you get lucky, while BFS will be faster if the nodes you search for are close to your starting node. In short, they both depend on the input, but I would choose BFS.
DFS is also harder to code without recursion, which makes BFS a bit faster in practice, since it is an iterative algorithm.
If you can normalize your nodes (number them from 1 to 10 000 and access them by number), then you can easily keep Exists[i] = true if node i is in the graph and false otherwise, giving you O(1) lookup time. Otherwise, consider using a hash table if normalization is not possible or you don't want to do it.
Depth-first search is best because
It uses much less memory
Easier to implement
the depth first and breadth first algorithms are almost identical, except for the use of a stack in one (DFS), a queue in the other (BFS), and a few required member variables. Implementing them both shouldn't take you much extra time.
Additionally if you have an adjacency list of the vertices then your look up with be O(V) anyway. So little to nothing will be gained via using one of the two other searches.
I'd comment on Konrad's post but I can't comment yet so... I'd like to second that it doesn't make a difference in performance if you implement DFS or BFS over a simple linear search through your list. Your search for a particular node in the graph doesn't depend on the structure of the graph, hence it's not necessary to confine yourself to graph algorithms. In terms of coding time, the linear search is the best choice; if you want to brush up your skills in graph algorithms, implement DFS or BFS, whichever you feel like.
If you are searching for a specific vertex and terminating when you find it, I would recommend using A*, which is a best-first search.
The idea is that you calculate the distance from the source vertex to the current vertex you are processing, and then "guess" the distance from the current vertex to the target.
You start at the source, calculate the distance (0) plus the guess (whatever that might be) and add it to a priority queue where the priority is distance + guess. At each step, you remove the element with the smallest distance + guess, do the calculation for each vertex in its adjacency list and stick those in the priority queue. Stop when you find the target vertex.
If your heuristic (your "guess") is admissible, that is, if it's always an under-estimate, then you are guaranteed to find the shortest path to your target vertex the first time you visit it. If your heuristic is not admissible, then you will have to run the algorithm to completion to find the shortest path (although it sounds like you don't care about the shortest path, just any path).
It's not really any more difficult to implement than a breadth-first search (you just have to add the heuristic, really) but it will probably yield faster results. The only hard part is figuring out your heuristic. For vertices that represent geographical locations, a common heuristic is to use an "as-the-crow-flies" (direct distance) heuristic.
Linear search is faster than BFS and DFS. But faster than linear search would be A* with the step cost set to zero. When the step cost is zero, A* will only expand the nodes that are closest to a goal node. If the step cost is zero then every node's path cost is zero and A* won't prioritize nodes with a shorter path. That's what you want since you don't need the shortest path.
A* is faster than linear search because linear search will most likely complete after O(n/2) iterations (each node has an equal chance of being a goal node) but A* prioritizes nodes that have a higher chance of being a goal node.