If the branching factor is large, the breadth-first search "OPEN list" may run out of memory. But if the number of states is so large, the statespace graph probably cannot be drawn or represented in memory. So is it not the case that all forms of search will fail?
Not necessarily - it could be that the state being searched for will be located at a fairly shallow position in the tree (thus being found before memory is exhausted), or (if the tree isn't too deep), you could use depth-first search instead. Also, pruning techniques might help you reduce the effective branching factor by revealing that some paths don't need to be followed.
Related
When I run Dijkstra and A* on different graphs, because both are optimal algorithms, I should always expect to find the same path, right?
Like for the following graph:
Nodes: S, A, B,C, D, E, G
Edges and costs: (S, A)=1, (A, B)=1 (B,C)=1,
(A,E)=8, (A, D)=6, (D, G)=2
heuristics: h(S)=6, h(C)=7, h(B)=6, h(A)=5, h(D)=2, h(E)=1, h(G)=0
I am finding S->A->D->G as path for both.
The cost of this path is 9 for both Dijkstra and A*.
Is this always the case for any graph because both are optimal?
If I want to compare these two algorithms, what should I use as statistics, time seems to be the same as well?
Thanks.
If the shortest path is unique, then any correct shortest path algorithm must find exactly that path.
On graphs where there are multiple paths between the start and the goal that have the same cost, even different implementations of the same algorithm are not guaranteed to find the same path, because there may be subtle differences in for example:
The order of processing outgoing edges from a node when it is "explored".
The order in which nodes of equal value are popped off of the priority queue.
For A*, different correct heuristics often result in different paths (of equal cost) due to ordering the priority queue differently.
If I want to compare these two algorithms, what should I use as statistics, time seems to be the same as well?
On big enough graphs and when you have a good heuristic, A* will eventually perform better. A common example is a grid without many impassable tiles, where Dijkstra will explore roughly in a circle but A* (if given a proper heuristic) will approximately aim right at the goal and explore only a "thick line". You can see this effect in action on this visualiation.
Is this always the case for any graph because both are optimal?
A* can be seen as an improved version of the Dijkstra algorithm. It can be faster because of the heuristic.
But A* is optimal only if the heuristic function is admissible, otherwise A* can find a solution that is correct but possibly sub-optimal compared to Dijkstra.
So the answer is yes if and only if you carefully choose you heuristic.
If I want to compare these two algorithms, what should I use as statistics, time seems to be the same as well?
It depends. Do you want to analyzes the quality of the resulting path of the algorithms or their execution speed ?
If your heuristic is good (ie. admissible), only the execution time will change. You can implement both and measure the execution time. You can also analyze the number of nodes explored.
I've tested A* search against Breadth First Searching (BFS) and Depth First Searching (DFS) and I find that fewer nodes are being expanded with A*.
I understand that A* expands paths that are already less expensive by using the heuristic and edge cost function.
In what cases would BFS and DFS be more efficient as compared to A* search algorithm?
BFS uses a queue while A* uses a priority queue. In general, queues are much faster than priority queues (eg. Dequeue() is O(1) vs O(log n)). The advantage of A* is that it normally expands far fewer nodes than BFS, but if that isn't the case, BFS will be faster. That can happen if the heuristic used is poor, or if the graph is very sparse or small, or if the heuristic fails for a given graph.
Keep in mind that BFS is only useful for unweighted graphs. If the graph is weighted, you need to use BFS's older brother, Dijkstra's Algorithm. This algorithm uses a priority queue, and as such should almost never be faster than A*, except in cases where the heuristic fails.
A breadth-first search may outperform A* when the heuristic is inconsistent. (An inconsistent heuristic doesn't obey the triangle inequality. A consistent heuristic never changes more than the edge cost from one state to the next.)
With an inconsistent heuristic A* may expand N states up to 2^N times. The example of where this occurs can be found online. Step through the example if you want to understand what happens. BFS will only expand each state at most once. Note that this can be partially fixed by algorithm B (N states expanded at most N^2 times), but this is still a large overhead. The recent IBEX algorithm has much better worst-case guarantees - N log C*, where C* is the optimal solution cost.
A depth-first search may outperform A* and BFS if the goal is on the first branch. In this demo you can place the goal at different states in the tree to see what happens.
There are other constant factors to consider. DFS only needs a single copy of a state, while A* keeps many states on the OPEN/CLOSED lists. But, in these cases IDA* should be used instead.
Note that theoretically speaking, in unidirectional search with a consistent heuristic, A* does the fewest number of necessary expansions required to prove that the solution is optimal.
I hope this isn't too much of an arbitrary question, but I have been looking through the source codes of Faile and TSCP and I have been playing them against each other. As far as I can see the engines have a lot in common, yet Faile searches ~1.3 million nodes per second while TSCP searches only 300k nodes per second.
The source code for faile can be found here: http://faile.sourceforge.net/download.php. TSCP source code can be found here: http://www.tckerrigan.com/Chess/TSCP.
After looking through them I see some similarities: both use an array board representation (although Faile uses a 144 size board), both use a alpha beta search with some sort of transposition table, both have very similar evaluate functions. The main difference I can find is that Faile uses a redundant representation of the board by also having arrays of the piece locations. This means that when the moves are generated (by very similar functions for both programs), Faile has to for loop through fewer bad pieces, while maintaining this array costs considerably fewer resources.
My question is: why is there a 4x difference in the speed of these two programs? Also, why does Faile consistently beat TSCP (I estimate about a ~200 ELO difference just by watching their moves)? For the latter, it seems to be because Faile is searching several plies deeper.
Short answer: TSCP is very simple (as you can guess from its name). Faile is more advanced, some time was spent by developers to optimize it. So it is just reasonable for Faile to be faster, which means also deeper search and higher ELO.
Long answer: As far as I remember, the most important part of the program, using alpha beta search (part which influences performance the most), is move generator. TSCP's move generator does not generate moves in any particular order. Faile's generator (as you noticed), uses piece list, which is sorted in order of decreasing piece value. This means it generates more important moves first. This allows alpha-beta pruning to cut more unneeded moves and makes search tree less branchy. And less branchy tree may be deeper and still have the same number of nodes, which allows deeper search.
Here is a very simplified example how the order of moves allows faster search. Suppose, last white's move was silly - they moved some piece to unprotected position. If we find some black's move that removes this piece, we can ignore all other, not yet estimated moves and return back to processing white's move list. Queen controls much more space than a pawn, so it has more chances to remove this piece, so if we look at queen's moves first, we can more likely skip more unneeded moves.
I didn't compare other parts of these programs. But most likely, Faile optimizes them better as well. Things like alpha-beta algorithm itself, variable depth of the search tree, static position analysis may be also optimized.
TSCP has not hash tables (-75 ELO).
TSCP has not Killers moves for ordering (-50 ELO).
TSCP has not null move (-100 ELO).
TSCP has a bad attack function design (-25 ELO).
In these 4 things you have about a difference of 250 points ELO. This will increase the number of nodes per second but you can not compare nodes per second on different engines as programmers can use a different interpretation of what is a node.
I'm looking for a good way to find a shortest path between two points in a network (directed, cyclic, weighted) of billions of nodes. Basically I want an algorithm that will typically get a solution very very quickly, even if its worst case is horrible.
I'm open to parallel or distributed algorithms, although it would have to make sense with the size of the data set (an algorithm that would work with CUDA on a graphics card would have to be able to be processed in chunks). I don't plan on using a farm of computers to do this, but potentially a few max.
A google search gives you a lot of good links. The first link itself talks about parallel implementations of two shortest path algorithms.
And talking about implementation on CUDA, you will have to remember that billions of nodes = Gigabytes of memory. That would provide a limitation on the nodes you can use per card (for optimum performance) at a time. The maximum capacity of a graphics card currently in the market is about 6GB. This can give you an estimate on the number of cards you may need to use (not necessarily the number of machines).
Look at Dikstra's algorithm. Generally it does an optimized multi-depth breadth first search until you're guaranteed to have found the shortest path. The first path found might be the shortest, but you can't be sure until the other branches of the search don't terminate with a shorter distance.
You could use an uniform cost search. This search algorithm will find a optimal solution in a weighted graph. If I remember correctly, the search complexity (space and time) is b^(C*/e+1), where b denotes the branching, C* the optimal path cost to your goal, and e is the average path cost.
And there is also something called bidirectional search, where you start from the initial state and goal state with the search and hopefully both starting points crosses each other somewhere in the middle of the graph :)
I am worried that unless your graph is somehow nicely layed out in the memory, you won't get much benefit from using CUDA, when compared to a well-tuned parallel algorithm on CPU. The problem is, that walking on a "totally-unordered" graphs lead to a lot of random memory accesses.
When you have 32 CUDA-threads working together in parallel, but their memory access is random, the fetch instruction has to be serialised. Since the search algorithm does not perform many hard mathematical computations, fetching memory is where you are likely to loose most of your time.
I wonder why searching in BST is faster than Binary search algorithm.
I am talking about tree that have (almost) always the same numbers of vectors in sub tree (well balanced.)
I have tested both of them and searching in BST is always faster. Why?
It's impossible to know without looking at the implementation. At their core, they are the same thing.
The BST needs to follow pointers to traverse into the right half, whereas binary search on arrays does arithmetic (e.g. addition and division/shift). Usually, the the binary search on arrays is a little faster because it traverses less memory overall (no pointers need to be stored) and it is more cache coherent in the final stages of the algorithm.
If the array variant is always slower for you, there's probably a glitch in the implementation or (but this is very unlikely!!) the arithmetic is a lot slower than all the memory overhead.
Both should be about the same in terms of speed. Both are O(log n). The binary search accesses a memory location and make a comparison at every iteration. The BST follows a pointer (which is also a memory access) and makes a comparison. The difference in constants within their big-O complexity should be negligible.
One possible reason might be the fact that you need to perform an extra calculation during every iteration of the binary search. Most implementations have a line like:
mid=(high+low)/2;
The division operation can be costly compared to integer addition and comparison operations. this might be contributing to the extra performance overhead. One way to reduce the impact would be using:
mid=(high+low)>>1;
But I think most compilers will optimize that for you anyway.
The BST variant does not need to compute anything, it just compares and follows the appropriate pointer.
Also it might be that you are doing your binary search recursively and your BST query non-recursively making the BST faster. But it is really hard to come up with any specific reasons without looking at your code.