Greedy search algorithm - artificial-intelligence

Greedy search algorithm - artificial-intelligence

Currently, I am a new to the Artificial Intelligence. I have a problem with the greedy search algorithm. I saw one question in a tutorial but can't understand how to answer it. Please help me. Any help much appreciated.
Consider the given figure 1. The values in each node represent the
heuristic cost from that node to goal node (G) and the values within
the arcs represent the path cost between two nodes.
If B is the starting node and G is the goal node,
Find the traversal using Greedy Search Algorithm.
Find the traversal using A* Search Algorithm
Using the result of part (1) show that greedy search is not optimal.

I assume that the greedy search algorithm that you refer to is having the greedy selection strategy as follows: Select the next node which is adjacent to the current node and has the least cost/distance from the current node. Note that the greedy solution don't use heuristic costs at all.
Consider the following figure well crafted such that it proves that greedy solution is not optimal.
The path highlighted with red shows the path taken by Greedy Algorithm and the path highlighted with green shows the path taken by Heuristic A* algorithm.
Explanation:
Greedy algorithm
Starting from Node B, the greedy algorithm sees the path costs (for A it's 6, for C it's 6 and for E it's 5)
We greedily move to node E because it is having least path value.
From E we have only one option to move to F
From F we again have only one option to move to H and from H we move to G (Goal state/node)
Cost for the path by Greedy Algorithm (highlighted in red): B -> E -> F -> H -> G = 5+6+6+3 = 20
A* algorithm (before going forward have a look at the wiki page for A* algorithm and understand what g(n) and h(n) are if you haven't already understood this concept):
Starting from node B, we have three options A, C and E. For each node we calculate f(n) = g(n) + h(n). Here g(n) is the immediate cost on the arc and h(n) is the heuristic value on the node
For node A, f(n) = 6 + 12 = 18
For node B, f(n) = 6 + 10 = 16
For node C, f(n) = 5 + 14 = 19
We choose to proceed with the node that has least f(n). So we move to node B.
We proceed in the similar fashion and find the path highlighted in green.
The path by A* algorithm is B -> C -> D -> H -> G and it's cost is 6+6+4+3 = 19
By the above example we can see that the cost of heuristic path is less than greedy algorithm. Hence greedy algorithm is not always optimal.

Related

Time Complexity of Bidirectional Search

Give the time complexity of bidirectional search when the test for connecting the two
searches is done by comparing a newly generated state in the forward direction against all the
states generated in the backward direction, one at a time.

If you are parsing all n nodes or states in a given set, on each iteration of n states, that would be n*n, or n^2.
However if you are only parsing every node up to the current node, then it is the sum of all nodes up to n.
The sum of all nodes up to n would be linear, specifically 1+2+3+...(n-1)+n = n(n+1)/2
Your application I believe is the latter, which would actually be better understood in reverse. Consider the current forward node to be n for this iteration, (n-1) is the first node backwards, (n-2) is the second node backwards, and so on until 1, the very last node backwards:
N + (n-1) + (n-2) + ... + 3 + 2 + 1 = n(n+1)/2
So:
[a, b, c, d, e, f]
1,a: a,b,c,d,e,f
2,b: a,b,c,d,e,f
... this would be n^2
And:
1,a: []
2,b: [a]
3,c: [a,b]
4,d: [a,b,c]
..... this would be linear as described above.

How to use linked list to implement the multiplication of two polynomial in O(M^2N)?

This is an exercise from "Data structure and algorithm analysis in C", exercise 3.7.
Assume there are two polynomials implemented in linked list. One has M terms. Another has N terms. The exercise ask me to implement the multiplication of two polynomial in O(M^2N)(Assume that M is the smaller one). How to solve this?

I can give you the idea.
Suppose the polynomials are 1+x+x^3 and 1+x^2.
Create a linked list P using (1,1)--->(1,1)--->(1,3)
Create another linked list Q (1,1)--->(1,2) where (a,b) denotes coefficient of x^b.
Now for each node in P, multiply it with each node of Q.
How? we will create a node res with (x,y) where
x= P->coeff * Q->coeff.
And y=P->exp+Q->exp.
Add this new node to the polynomial which will contain the answer.
During insertion in the answer polynomial you have to keep in mind 2 things-
i) Keep a sorted list (sorted against the exp)(increasing maybe as I have takem increasing here--you can take decreasing also).
ii) Get the correct position in case you add new node and if a node with same exp value exists add the coeff only and delete the node that you are about to insert.
Ok! Now print the polynomial.
The complexity analysis.

You can find my implementation for polynomial multiplication here, which reads the polynomials from a file in the following format:
let's say 2X^5 + 3x^3 + 5X is a polynomial expression and its linked list representation is as follows:
| 2 | 5 | -|---> | 3 | 4 | -|---> | 5 | 1 | NULL|
Regarding the time complexity, I perform the multiplication by multiplying each term of the first polynomial by each term of the second, and than adding common terms (i.e. have the same degree) together.
If the first polynomial size is N and the second polynomial size is M, than by multiplying them term by term we get a complexity of N*M.
Pay attention that the output polynomial can be at most of size N+M (after adding together the common terms, so if you assume that M>N, and at each step of the terms multiplication you add a value to the cell which holds the same degree (if such exists, else you add a new cell), you can the desired complexity.

Assume that M < N.
Multiplying each term in the M terms' polynomial by the entire N terms' polynomial. Each step you will get an temporary result polynomial with N terms, then take the union of the result polynomial with this temp polynomial, which causes O(MN+N) in the last step.
Remind that there's O(MN) multiplications.
The union operations totally take O((N+N)+(2N+N)+(3N+N)+...+(MN+N))=O(M^2N).
So overall the complexity takes O(MN)+O(M^2N) = O(M^2N)

Is Wikipedia's Astar reference implementation incomplete? It seems to omit properly updating cheaper paths

I want to implement A* and I looked to Wikipedia for a reference.
It looks like it can fail in the following case. Consider three nodes, A, B, and C.
START -> A -> C -> GOAL
| ^
\-> B
The path costs are:
START -> A : 10
START -> B : 1
B -> A : 1
A -> C : 100
C -> GOAL : 100
Clearly the solution is START -> B -> A -> C -> GOAL but what happens if the heuristic lets us expand A before expanding B?
Our heuristic costs are as follows (note these are all underestimates)
A -> GOAL : 10
B -> GOAL : 50
When A is expanded, the true cost to C will turn out out to be higher than B's heuristic cost, and so B will be expanded before C.
Fine, right?
The problem I see is that when we expand B and replace the datum "A comes from START with cost 10" to "A comes from B with cost 2" we aren't also updating "C comes from A with cost 110" to "C comes from A with cost 102". There is nothing in Wikipedia's pseudocode that looks like it will forward-propagate the cheaper path. Now imagine another node D which can reach C with cost 105, it will erroneously override "C comes from A with cost 110".
Am I reading this wrong or does Wikipedia need to be fixed?

If you are using graph search, i.e. you remember which nodes you visit and you don't allow revisiting the nodes, then your heuristic is not consistent. It says in the article, that for a heuristic to be consistent, following needs to hold:
h(x) <= d(x, y) + h(y) for all adjacent nodes x, y
In your case the assumption h(B) = 50 is inconsistent as d(B -> A) + h(A) = 1 + 10 = 11. Hence your heuristic is inconsistent and A* wouldn't work in this case, as you rightly noticed and as is also mentioned in the wikipedia article: http://en.wikipedia.org/wiki/A%2a_search_algorithm#Properties.
If you are using tree search, i.e. you allow the algorithm to revisit the nodes, the following will happen:
Add A and B to the queue, score(A) = 10 + 10 = 20, score(B) = 1 + 50 = 51.
Pick A from queue as it has smallest score. Add C to the queue with score(C) = 10 + 100 + h(C).
Pick B from the queue as it is now the smallest. Add A to the queue with score(A) = 2 + 10 = 12.
Pick A from the queue as it is now again smallest. Notice that we are using tree search algorithm, so we can revisit nodes. Add C to the queue with score(C) = 1 + 1 + 100 + h(C).
Now we have 2 elements in the queue, C via A with score 110 + h(C) and C via B and A with score 102 + h(C), so we pick the correct path to C via B and A.
The wikipedia pseudocode is the first case, i.e. graph search. And they indeed state right under the pseudocode that:
Remark: the above pseudocode assumes that the heuristic function is monotonic (or consistent, see below), which is a frequent case in many practical problems, such as the Shortest Distance Path in road networks. However, if the assumption is not true, nodes in the closed set may be rediscovered and their cost improved. In other words, the closed set can be omitted (yielding a tree search algorithm) if a solution is guaranteed to exist, or if the algorithm is adapted so that new nodes are added to the open set only if they have a lower f value than at any previous iteration.

Least cost accumulative path in a sorted array

This question is an extension of a question asked earlier:
Least cost path in a sorted array
Given a sorted array A e.g. {4,9,10,11,19}. The cost for moving from i->j is
abs(A[j] - A[i]) + cost_incurred_till_i. Start from a given element e.g. 10. Find the least cost path without visiting same element twice.
For the given array:
10->9->4->11->19 cost: 1+(1+5)+(1+5+7)+(1+5+7+8) = 41
10->4->9->11->19 cost: 5+(5+5)+(5+5+2)+(5+5+2+8) = 47
10->9->11->4->19 cost: 1+(1+2)+(1+2+7)+(1+2+7+15) = 39
10->11->9->4->19 cost: 1+(1+2)+(1+2+5)+(1+2+5+15) = 35 --one of optimal paths
10->11->19->9->4 cost: 1+(1+8)+(1+8+10)+(1+8+10+5) = 53
10->11->19->4->9 cost: 1+(1+8)+(1+8+15)+(1+8+15+5) = 63
...
I tried to solve this using nearest neighbor approach.
i = start
While (array is not empty)
ldiff = A[i] - A[i-1]
rdiff = A[i+1] - A[i]
(ldiff < rdiff) ? sum += ldiff : sum += rdiff
remove A[i]
In this case nearest neighbor works for some cases where we don't have equal weighted paths. I have realised that this is TSP problem. What could be the best approach to solve this problem? Shall I use TSP heuristics like Christofides or some other algorithm?

You're close, and you can just modify the nearest neighbor a bit. When the two neighbors are equal, check the element past that neighbor, and go in the opposite direction of whichever's closer(to avoid backtracking as much). If those elements are the same distance, just keep looking ahead until they're not. If you reach an out-of-bounds before you see a difference, go toward it.
Your example is a good one to see this:
The only branch point we have is deciding whether to visit 9 or 11 in the first step from 10. Looking past them in both directions shows 4 and 19. 4 is closer to 10, so head away from it(to 11).
Obviously this will be quicker with arrays that don't have many sequential evenly-spaced elements. If none of them were evenly spaced, it would be the same as yours, running in n steps.
Worst case is that you'll have to look all the way to both ends at each step, which would visit every element. Since we're running this once for each n element, it comes out to O(n^2). An example would be an array with all evenly spaced elements, starting your search from dead center.

There is an O(n2) dynamic programming solution. I don't know if it's optimal.
The next choice is always an immediate neighbour from amongst the unvisited nodes, so the visited nodes form a contiguous range. A logical subproblem is to find a partial solution given the range of visited nodes. The optimal solutions to the subproblems only depend on the visited range and the last visited node (which must be one of the endpoints).
Subproblems can be encoded using two indices identifying the visited range, with the order indicating the last visited node. The solution to subproblem (a, b) is the partial solution given that the nodes from min(a,b) to max(a,b) have already been visited and that a was the last visited node. It can be defined recursively as the better of
insert(a, solve(a - dir, b))
insert(a, solve(b + dir, a))
where dir is 1 if b >= a and -1 otherwise.
There are two base cases. Subproblem (0, n-1) has solution {A[0]}, and subproblem (n-1, 0) has solution {A[n-1]}. These correspond to the final choice, which is either the first node or the last node.
The full problem corresponds to subproblem (s, s), where s is the index of the starting element.

Random walks in directed graphs/networks

I have a weighted graph with (in practice) up to 50,000 vertices. Given a vertex, I want to randomly choose an adjacent vertex based on the relative weights of all adjacent edges.
How should I store this graph in memory so that making the selection is efficient? What is the best algorithm? It could be as simple as a key value store for each vertex, but that might not lend itself to the most efficient algorithm. I'll also need to be able update the network.
Note that I'd like to take only one "step" at a time.
More Formally: Given a weighted, directed, and potentially complete graph, let W(a,b) be the weight of edge a->b and let Wa be the sum of all edges from a. Given an input vertex v, I want to choose a vertex randomly where the likelihood of choosing vertex x is W(v,x) / Wv
Example:
Say W(v,a) = 2, W(v,b) = 1, W(v,c) = 1.
Given input v, the function should return a with probability 0.5 and b or c with probability 0.25.

If you are concerned about the performance of generating the random walk you may use the alias method to build a datastructure which fits your requirements of choosing a random outgoing edge quite well. The overhead is just that you have to assign each directed edge a probability weight and a so-called alias-edge.
So for each note you have a vector of outgoing edges together with the weight and the alias edge. Then you may choose random edges in constant time (only the generation of th edata structure is linear time with respect to number of total edges or number of node edges). In the example the edge is denoted by ->[NODE] and node v corresponds to the example given above:
Node v
->a (p=1, alias= ...)
->b (p=3/4, alias= ->a)
->c (p=3/4, alias= ->a)
Node a
->c (p=1/2, alias= ->b)
->b (p=1, alias= ...)
...
If you want to choose an outgoing edge (i.e. the next node) you just have to generate a single random number r uniform from interval [0,1).
You then get no=floor(N[v] * r) and pv=frac(N[v] * r) where N[v] is the number of outgoing edges. I.e. you pick each edge with the exact same probability (namely 1/3 in the example of node v).
Then you compare the assigned probability p of this edge with the generated value pv. If pv is less you keep the edge selected before, otherwise you choose its alias edge.
If for example we have r=0.6 from our random number generator we have
no = floor(0.6*3) = 1
pv = frac(0.6*3) = 0.8
Therefore we choose the second outgoing edge (note the index starts with zero) which is
->b (p=3/4, alias= ->a)
and switch to the alias edge ->a since p=3/4 < pv.
For the example of node v we therefore
choose edge b with probability 1/3*3/4 (i.e. whenever no=1 and pv<3/4)
choose edge c with probability 1/3*3/4 (i.e. whenever no=2 and pv<3/4)
choose edge a with probability 1/3 + 1/3*1/4 + 1/3*1/4 (i.e. whenever no=0 or pv>=3/4)

In theory the absolutely most efficient thing to do is to store, for each node, the moral equivalent of a balanced binary tree (red-black, or BTree, or skip list all fit) of the connected nodes and their weights, and the total weight to each side. Then you can pick a random number from 0 to 1, multiply by the total weight of the connected nodes, then do a binary search to find it.
However traversing a binary tree like that involves a lot of choices, which have a tendency to create pipeline stalls. Which are very expensive. So in practice if you're programming in an efficient language (eg C++), if you've got less than a couple of hundred connected edges per node, a linear list of edges (with a pre-computed sum) that you walk in a loop may prove to be faster.