What's the difference between best-first search and A* search? - artificial-intelligence

In my text book I noticed that both these algorithms work almost exactly the same, I am trying to understand what's the major difference between them.
The textbook traversed this example using A* the same way it did with best-first search.
Any help would be appreciated.

Best-first search algorithm visits next state based on heuristics function f(n) = h with lowest heuristic value (often called greedy). It doesn't consider cost of the path to that particular state. All it cares about is that which next state from the current state has lowest heuristics.
A* search algorithm visits next state based on heristics f(n) = h + g where h component is same heuristics applied as in Best-first search but g component is path from the initial state to the particular state. Therefore it doesn't chooses next state only with lowest heuristics value but one that gives lowest value when considering it's heuristics and cost of getting to that state.
In your example above when you start from Arad you can go either
straight to Sibiu (253km) or to the Zerind(374km) or Timisoara(329km).
In this case both algorithms choose Sibiu as it has lower value f(n) =
253.
Now you can expand to either state back to Arad(366km) or
Oradea(380km) or Faragas(178km) or Rimnicu Vilcea(193km). For best
first search Faragas will have lowest f(n) = 178 but A* will
have Rimnicu Vilcea f(n) = 220 + 193 = 413 where 220 is cost of
getting to Rimnicu from Arad (140+80) and 193 is from Rimnicu to
Bucharest but for Faragas it will be more as f(n) = 239 + 178 = 417.
So now clearly you can see best-first is greedy algorithm because it would choose state with lower heuristics but higher overall cost as it doesn't consider cost of getting to that state from initial state

A* achieves better performance by using heuristics to guide its search. A* combines the advantages of Best-first Search and Uniform Cost Search: ensure to find the optimized path while increasing the algorithm efficiency using heuristics. A* function would be f(n) = g(n) + h(n) with h(n) being the estimated distance between any random vertex n and target vertex, g(n) being the actual distance between the start point and any vertex n. If g(n)=0, the A* turns to be Best-First Search. If h(n)=0, then A* turns to be Uniform-Cost Search.

Related

Is best first search optimal and complete?

I have some doubts regarding best first search algorithm. The pseudocode that I have is the following:
best first search pseudocode
First doubt: is it complete? I have read that it is not because it can enter in a dead end, but I don't know when can happen, because if the algorithm chooses a node that has not more neighbours it does not get stucked in it because this node is remove from the open list and in the next iteration the following node of the open list is treated and the search continues.
Second doubt: is it optimal? I thought that if it is visiting the nodes closer to the goal along the search process, then the solution would be the shortest, but it is not in that way and I do not know the reason for that and therefore, the reason that makes this algorithm not optimal.
The heuristic I was using is the straight line distance between two points.
Thanks for your help!!
Of course, if heuristic function underestimates the costs, best first search is not optimal. In fact, even if your heuristic function is exactly right, best first search is never guaranteed to be optimal. Here is a counter example. Consider the following graph:
The green numbers are the actual costs and the red numbers are the exact heuristic function. Let's try to find a path from node S to node G.
Best first search would give you S->A->G following the heuristic function. However, if you look at the graph closer, you would see that the path S->B->C->G has lower cost of 5 instead of 6. Thus, this is an example of best first search performing suboptimal under perfect heuristic function.
In general case best first search algorithm is complete as in worst case scenario it will search the whole space (worst option). Now, it should be also optimal - given the heuristic function is admissible - meaning it does not overestimate the cost of the path from any of the nodes to goal. (It also needs to be consistent - that means that it adheres to triangle inequality, if it is not then the algorithm would not be complete - as it could enter a cycle)
Checking your algorithm I do not see how the heuristic function is calculated. Also I do not see there is calculated the cost of the path to get to the particular node.
So, it needs to calculate the actual cost of the path to reach a particular node and then it needs to add a heuristics estimate of the cost of the path from the node towards goal.
The formula is f(n)=g(n)+h(n) where g(n) is the cost of the path to reach the node and h(n) is the heuristics estimating the cost of the cheapest path from n to the goal.
Check the implementation of A* algorithm which is an example of best first search on path planning.
TLDR In best first search, you need to calculate the cost of a node as a sum of the cost of the path to get to that node and the heuristic function that estimate the cost of the path from that node to the goal. If the heuristic function will be admissible and consistent the algorithm will be optimal and complete.

A* Graph search

Given the heuristic values h(A)=5, h(B)=1, using A* graph search, it will put A and B on the frontier with f(A)=2+5=7, f(B)=4+1=5, then select B for expansion, then put G on frontier with f(G)=4+4=8, then it will select A for expansion, but will not do anything since both S and B are already expanded and not on frontier, and therefore it will select G next and return a non-optimal solution.
Is my argument correct?
There are two heuristic concepts here:
Admissible heuristic: When for each node n in the graph, h(n) never overestimates the cost of reaching the goal.
Consistent heuristic: When for each node n in the graph and each node m of its successors, h(n) <= h(m) + c(n,m), where c(n,m) is the cost of the arc from n to m.
Your heuristic function is admissible but not consistent, since as you have shown:
h(A) > h(B) + c(A,B), 5 > 2.
If the heuristic is consistent, then the estimated final cost of a partial solution will always grow along the path, i.e. f(n) <= f(m) and as we can see again:
f(A) = g(A) + h(A) = 7 > f(B) = g(B) + h(B) = 5,
this heuristic function does not satisfy this property.
With respect to A*:
A* using an admissible heuristic guarantees to find the shortest path from the start to the goal.
A* using a consistent heuristic, in addition to find the shortest path, also guarantees that once a node is explored we have already found the shortest path to this node, and therefore no node needs to be reexplored.
So, answering your question, A* algorithm has to be implemented to reopen nodes when a shorter path to a node is found (updating also the new path cost), and this new path will be added to the open set or frontier, therefore your argument is not correct, since B has to be added again to the frontier (now with the path S->A->B and cost 3).
If you can restrict A* to be used only with consistent heuristic functions then yes, you can discard path to nodes that have been already explored.
You maintain an ordered priority queue of objects on the frontier. You then take the best candidate, expand in all available directions, and put the new nodes in the priority queue. So it's possible for A to be pushed to the back of queue even though in fact the optimal path goes through it. It's also possible for A to be hemmed in by neighbours which were reached through sub-optimal paths, in which case most algorithms won't try to expand it as you say.
A star is only an a way of finding a reasonable path, it doesn't find the globally optimal path.

What's the main difference between dijkstra's algorithm and Prim's algorithm? [duplicate]

What is the exact difference between Dijkstra's and Prim's algorithms? I know Prim's will give a MST but the tree generated by Dijkstra will also be a MST. Then what is the exact difference?
Prim's algorithm constructs a minimum spanning tree for the graph, which is a tree that connects all nodes in the graph and has the least total cost among all trees that connect all the nodes. However, the length of a path between any two nodes in the MST might not be the shortest path between those two nodes in the original graph. MSTs are useful, for example, if you wanted to physically wire up the nodes in the graph to provide electricity to them at the least total cost. It doesn't matter that the path length between two nodes might not be optimal, since all you care about is the fact that they're connected.
Dijkstra's algorithm constructs a shortest path tree starting from some source node. A shortest path tree is a tree that connects all nodes in the graph back to the source node and has the property that the length of any path from the source node to any other node in the graph is minimized. This is useful, for example, if you wanted to build a road network that made it as efficient as possible for everyone to get to some major important landmark. However, the shortest path tree is not guaranteed to be a minimum spanning tree, and the sum of the costs on the edges of a shortest-path tree can be much larger than the cost of an MST.
Another important difference concerns what types of graphs the algorithms work on. Prim's algorithm works on undirected graphs only, since the concept of an MST assumes that graphs are inherently undirected. (There is something called a "minimum spanning arborescence" for directed graphs, but algorithms to find them are much more complicated). Dijkstra's algorithm will work fine on directed graphs, since shortest path trees can indeed be directed. Additionally, Dijkstra's algorithm does not necessarily yield the correct solution in graphs containing negative edge weights, while Prim's algorithm can handle this.
Dijkstra's algorithm doesn't create a MST, it finds the shortest path.
Consider this graph
5 5
s *-----*-----* t
\ /
-------
9
The shortest path is 9, while the MST is a different 'path' at 10.
Prim and Dijkstra algorithms are almost the same, except for the "relax function".
Prim:
MST-PRIM (G, w, r) {
for each key ∈ G.V
u.key = ∞
u.parent = NIL
r.key = 0
Q = G.V
while (Q ≠ ø)
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q)
alt = w(u,v) <== relax function, Pay attention here
if alt < v.key
v.parent = u
v.key = alt
}
Dijkstra:
Dijkstra (G, w, r) {
for each key ∈ G.V
u.key = ∞
u.parent = NIL
r.key = 0
Q = G.V
while (Q ≠ ø)
u = Extract-Min(Q)
for each v ∈ G.Adj[u]
if (v ∈ Q)
alt = w(u,v) + u.key <== relax function, Pay attention here
if alt < v.key
v.parent = u
v.key = alt
}
The only difference is pointed out by the arrow, which is the relax function.
The Prim, which searches for the minimum spanning tree, only cares about the minimum of the total edges cover all the vertices. The relax function is alt = w(u,v)
The Dijkstra, which searches for the minimum path length, so it cares about the edge accumulation. The relax function is alt = w(u,v) + u.key
Dijsktra's algorithm finds the minimum distance from node i to all nodes (you specify i). So in return you get the minimum distance tree from node i.
Prims algorithm gets you the minimum spaning tree for a given graph. A tree that connects all nodes while the sum of all costs is the minimum possible.
So with Dijkstra you can go from the selected node to any other with the minimum cost, you don't get this with Prim's
The only difference I see is that Prim's algorithm stores a minimum cost edge whereas Dijkstra's algorithm stores the total cost from a source vertex to the current vertex.
Dijkstra gives you a way from the source node to the destination node such that the cost is minimum. However Prim's algorithm gives you a minimum spanning tree such that all nodes are connected and the total cost is minimum.
In simple words:
So, if you want to deploy a train to connecte several cities, you would use Prim's algo. But if you want to go from one city to other saving as much time as possible, you'd use Dijkstra's algo.
Both can be implemented using exactly same generic algorithm as follows:
Inputs:
G: Graph
s: Starting vertex (any for Prim, source for Dijkstra)
f: a function that takes vertices u and v, returns a number
Generic(G, s, f)
Q = Enqueue all V with key = infinity, parent = null
s.key = 0
While Q is not empty
u = dequeue Q
For each v in adj(u)
if v is in Q and v.key > f(u,v)
v.key = f(u,v)
v.parent = u
For Prim, pass f = w(u, v) and for Dijkstra pass f = u.key + w(u, v).
Another interesting thing is that above Generic can also implement Breadth First Search (BFS) although it would be overkill because expensive priority queue is not really required. To turn above Generic algorithm in to BFS, pass f = u.key + 1 which is same as enforcing all weights to 1 (i.e. BFS gives minimum number of edges required to traverse from point A to B).
Intuition
Here's one good way to think about above generic algorithm: We start with two buckets A and B. Initially, put all your vertices in B so the bucket A is empty. Then we move one vertex from B to A. Now look at all the edges from vertices in A that crosses over to the vertices in B. We chose the one edge using some criteria from these cross-over edges and move corresponding vertex from B to A. Repeat this process until B is empty.
A brute force way to implement this idea would be to maintain a priority queue of the edges for the vertices in A that crosses over to B. Obviously that would be troublesome if graph was not sparse. So question would be can we instead maintain priority queue of vertices? This in fact we can as our decision finally is which vertex to pick from B.
Historical Context
It's interesting that the generic version of the technique behind both algorithms is conceptually as old as 1930 even when electronic computers weren't around.
The story starts with Otakar Borůvka who needed an algorithm for a family friend trying to figure out how to connect cities in the country of Moravia (now part of the Czech Republic) with minimal cost electric lines. He published his algorithm in 1926 in a mathematics related journal, as Computer Science didn't existed then. This came to the attention to Vojtěch Jarník who thought of an improvement on Borůvka's algorithm and published it in 1930. He in fact discovered the same algorithm that we now know as Prim's algorithm who re-discovered it in 1957.
Independent of all these, in 1956 Dijkstra needed to write a program to demonstrate the capabilities of a new computer his institute had developed. He thought it would be cool to have computer find connections to travel between two cities of the Netherlands. He designed the algorithm in 20 minutes. He created a graph of 64 cities with some simplifications (because his computer was 6-bit) and wrote code for this 1956 computer. However he didn't published his algorithm because primarily there were no computer science journals and he thought this may not be very important. The next year he learned about the problem of connecting terminals of new computers such that the length of wires was minimized. He thought about this problem and re-discovered Jarník/Prim's algorithm which again uses the same technique as the shortest path algorithm he had discovered a year before. He mentioned that both of his algorithms were designed without using pen or paper. In 1959 he published both algorithms in a paper that is just 2 and a half page long.
Dijkstra finds the shortest path between it's beginning node
and every other node. So in return you get the minimum distance tree from beginning node i.e. you can reach every other node as efficiently as possible.
Prims algorithm gets you the MST for a given graph i.e. a tree that connects all nodes while the sum of all costs is the minimum possible.
To make a story short with a realistic example:
Dijkstra wants to know the shortest path to each destination point by saving traveling time and fuel.
Prim wants to know how to efficiently deploy a train rail system i.e. saving material costs.
Directly from Dijkstra's Algorithm's wikipedia article:
The process that underlies Dijkstra's algorithm is similar to the greedy process used in Prim's algorithm. Prim's purpose is to find a minimum spanning tree that connects all nodes in the graph; Dijkstra is concerned with only two nodes. Prim's does not evaluate the total weight of the path from the starting node, only the individual path.
Here's what clicked for me: think about which vertex the algorithm takes next:
Prim's algorithm takes next the vertex that's closest to the tree, i.e. closest to some vertex anywhere on the tree.
Dijkstra's algorithm takes next the vertex that is closest to the source.
Source: R. Sedgewick's lecture on Dijkstra's algorithm, Algorithms, Part II: https://coursera.org/share/a551af98e24292b6445c82a2a5f16b18
I was bothered with the same question lately, and I think I might share my understanding...
I think the key difference between these two algorithms (Dijkstra and Prim) roots in the problem they are designed to solve, namely, shortest path between two nodes and minimal spanning tree (MST). The formal is to find the shortest path between say, node s and t, and a rational requirement is to visit each edge of the graph at most once. However, it does NOT require us to visit all the node. The latter (MST) is to get us visit ALL the node (at most once), and with the same rational requirement of visiting each edge at most once too.
That being said, Dijkstra allows us to "take shortcut" so long I can get from s to t, without worrying the consequence - once I get to t, I am done! Although there is also a path from s to t in the MST, but this s-t path is created with considerations of all the rest nodes, therefore, this path can be longer than the s-t path found by the Dijstra's algorithm. Below is a quick example with 3 nodes:
2 2
(s) o ----- o ----- o (t)
| |
-----------------
3
Let's say each of the top edges has the cost of 2, and the bottom edge has cost of 3, then Dijktra will tell us to the take the bottom path, since we don't care about the middle node. On the other hand, Prim will return us a MST with the top 2 edges, discarding the bottom edge.
Such difference is also reflected from the subtle difference in the implementations: in Dijkstra's algorithm, one needs to have a book keeping step (for every node) to update the shortest path from s, after absorbing a new node, whereas in Prim's algorithm, there is no such need.
The simplest explanation is in Prims you don't specify the Starting Node, but in dijsktra you (Need to have a starting node) have to find shortest path from the given node to all other nodes.
The key difference between the basic algorithms lies in their different edge-selection criteria. Generally, they both use a priority queue for selecting next nodes, but have different criteria to select the adjacent nodes of current processing nodes: Prim's Algorithm requires the next adjacent nodes must be also kept in the queue, while Dijkstra's Algorithm does not:
def dijkstra(g, s):
q <- make_priority_queue(VERTEX.distance)
for each vertex v in g.vertex:
v.distance <- infinite
v.predecessor ~> nil
q.add(v)
s.distance <- 0
while not q.is_empty:
u <- q.extract_min()
for each adjacent vertex v of u:
...
def prim(g, s):
q <- make_priority_queue(VERTEX.distance)
for each vertex v in g.vertex:
v.distance <- infinite
v.predecessor ~> nil
q.add(v)
s.distance <- 0
while not q.is_empty:
u <- q.extract_min()
for each adjacent vertex v of u:
if v in q and weight(u, v) < v.distance:// <-------selection--------
...
The calculations of vertex.distance are the second different point.
Dijkstras algorithm is used only to find shortest path.
In Minimum Spanning tree(Prim's or Kruskal's algorithm) you get minimum egdes with minimum edge value.
For example:- Consider a situation where you wan't to create a huge network for which u will be requiring a large number of wires so these counting of wire can be done using Minimum Spanning Tree(Prim's or Kruskal's algorithm) (i.e it will give you minimum number of wires to create huge wired network connection with minimum cost).
Whereas "Dijkstras algorithm" will be used to get the shortest path between two nodes while connecting any nodes with each other.
Dijkstra's algorithm is a single source shortest path problem between node i and j, but Prim's algorithm a minimal spanning tree problem. These algorithm use programming concept named 'greedy algorithm'
If you check these notion, please visit
Greedy algorithm lecture note : http://jeffe.cs.illinois.edu/teaching/algorithms/notes/07-greedy.pdf
Minimum spanning tree : http://jeffe.cs.illinois.edu/teaching/algorithms/notes/20-mst.pdf
Single source shortest path : http://jeffe.cs.illinois.edu/teaching/algorithms/notes/21-sssp.pdf
#templatetypedef has covered difference between MST and shortest path. I've covered the algorithm difference in another So answer by demonstrating that both can be implemented using same generic algorithm that takes one more parameter as input: function f(u,v). The difference between Prim and Dijkstra's algorithm is simply which f(u,v) you use.
At the code level, the other difference is the API.
You initialize Prim with a source vertex, s, i.e., Prim.new(s); s can be any vertex, and regardless of s, the end result, which are the edges of the minimum spanning tree (MST) are the same. To get the MST edges, we call the method edges().
You initialize Dijkstra with a source vertex, s, i.e., Dijkstra.new(s) that you want to get shortest path/distance to all other vertices. The end results, which are the shortest path/distance from s to all other vertices; are different depending on the s. To get the shortest paths/distances from s to any vertex, v, we call the methods distanceTo(v) and pathTo(v) respectively.
They both create trees with the greedy method.
With Prim's algorithm we find minimum cost spanning tree. The goal is to find minimum cost to cover all nodes.
with Dijkstra we find Single Source Shortest Path. The goal is find the shortest path from the source to every other node
Prim’s algorithm works exactly as Dijkstra’s, except
It does not keep track of the distance from the source.
Storing the edge that connected the front of the visited vertices to the next closest vertex.
The vertex used as “source” for Prim’s algorithm is
going to be the root of the MST.

how search algorithm exactly behaves with this function: f(n)=3h(n)

Imagine I am performing a best-first search based on the function f(n)=cg(n)+(3-c)h(n) for selecting the next node to expand. If I use c=0 I will get f(n)=3h(n). Can I say that with c=0 the search algorithm behaves exactly like Best first search or greedy best first search?
(I am in doubt between the two. My answer is yes because it just looks ahead and do not consider g(n) and also my feeling is best first search because it overestimate by multiplying by 3 so it is not greedy but I am not sure if I am right.)
You are referring to algorithms like A* which perform a best-first search based on f-cost, where f(n) = g(n) + h(n) and the g-cost of a node is the cost to reach that node, while h-cost is the estimated cost to reach the goal.
Dijkstra's algorithm uses f(n) = g(n).
Pure heuristic search or greedy best-first search uses f(n) = h(n).
Your question is what happens if I have:
f(n) = c*g(n) + (3-c)*h(n)
When c = 0 this reduces to:
f(n) = (3)*h(n)
The constant of 3 here has no influence on the search order, because all nodes are weighted equally in the same way. So, this is closest to a greedy best-first search.

What is the difference between Greedy-Search and Uniform-Cost-Search?

When searching in a tree, my understanding of uniform cost search is that for a given node A, having child nodes B,C,D with associated costs of (10, 5, 7), my algorithm will choose C, as it has a lower cost. After expanding C, I see nodes E, F, G with costs of (40, 50, 60). It will choose 40, as it has the minimum value from both 3.
Now, isn't it just the same as doing a Greedy-Search, where you always choose what seems to be the best action?
Also, when defining costs from going from certain nodes to others, should we consider the whole cost from the beginning of the tree to the current node, or just the cost itself from going from node n to node n'?
Thanks
Nope. Your understanding isn't quite right.
The next node to be visited in case of uniform-cost-search would be D, as that has the lowest total cost from the root (7, as opposed to 40+5=45).
Greedy Search doesn't go back up the tree - it picks the lowest value and commits to that. Uniform-Cost will pick the lowest total cost from the entire tree.
In a uniform cost search you always consider all unvisited nodes you have seen so far, not just those that are connected to the node you looked at. So in your example, after choosing C, you would find that visiting G has a total cost of 40 + 5 = 45 which is higher than the cost of starting again from the root and visiting D, which has cost 7. So you would visit D next.
The difference between them is that the Greedy picks the node with the lowest heuristic value while the UCS picks the node with the lowest action cost. Consider the following graph:
If you run both algorithms, you'll get:
UCS
Picks: S (cost 0), B (cost 1), A (cost 2), D (cost 3), C (cost 5), G (cost 7)
Answer: S->A->D->G
Greedy:
*supposing it chooses the A instead of B; A and B have the same heuristic value
Picks: S , A (h = 3), C (h = 1), G (h = 0)
Answer: S->A->C->G
So, it's important to differentiate the action cost to get to the node from the heuristic value, which is a piece of information that is added to the node, based on the understanding of the problem definition.
Greedy search (for most of this answer, think of greedy best-first search when I say greedy search) is an informed search algorithm, which means the function that is evaluated to choose which node to expand has the form of f(n) = h(n), where h is the heuristic function for a given node n that returns the estimated value from this node n to a goal state. If you're trying to travel to a place, one example of a heuristic function is one that returns the estimated distance from node n to your destination.
Uniform-cost search, on the other hand, is an uninformed search algorithm, also known as a blind search strategy. This means that the value of the function f for a given node n, f(n), for uninformed search algorithms, takes into consideration g(n), the total action cost from the root node to the node n, that is, the path cost. It doesn't have any information about the problem apart from the problem description, so that's all it can know. You don't have any information that can help you decide how close one node is to a goal state, only to the root node. You can watch the nodes expanding here (Animation of the Uniform Cost Algorithm) and see how the cost from node n to the root is used to choose which nodes to expand.
Greedy search, just like any greedy algorithm, takes locally optimal solutions and uses a function that returns an estimated value from a given node n to the goal state. You can watch the nodes expanding here (Greedy Best First Search | Quick Explanation with Visualization) and see how the return of the heuristic function from node n to the goal state is used to choose which nodes to expand.
By the way, sometimes, the path chosen by greedy search is not a global optimum. In the example in the video, for example, node A is never expanded because there are always nodes with smaller values of h(n). But what if A has such a high value, and the values for the next nodes are very small and therefore a global optimum? That can happen. A bad heuristic function can cause this. Getting stuck in a loop is also possible. A*, which is also a greedy search algorithm, fixes this by making use of both the path cost (which implies knowing nodes already visited) and a heuristic function, that is, f(n) = g(n) + h(n).
It's possible that to this point, it's still not clear to you HOW uniform-cost knows there is another path that looks better locally but not globally. It should become clear after telling you that if all paths have the same cost, uniform cost search is the same thing as the breadth-first search (BFS). It would expand all nodes just like BFS.
UCS cares about history,
Greedy does not.
In your example, after expanding C, the next node would be D according to the UCS. Because, it's our history. UCS can't forget the past and remember that the total cost of D is much lower than E.
Don't be Greedy. Be UCS and if going back is really a better choice, don't afraid of going back!

Resources