Why to use non complex algorithms - artificial-intelligence

In my Intro to AI class, we have been studying:
Uniformed Search (i.e. Depth-First Search)
Informed Search (i.e. A* Search)
Constraint Satisfaction pRoblems (i.e. Hill Climbing)
Adversarial Search (i.e. Minimax)
In general lines, why would we use, for example, Depth-First Search instead of using more complex algorithms such as A* Search? In other words, why choosing simple and limited algorithms when we can choose complex ones?

The main reason is efficiency. Some algorithms take much more time/memory than others.
Some algorithms won't work will in certain situations. For example, if there are local maxima, Hill Climbing won't work very well.
If you expect most paths to lead to destination, you can use Depth First, which could be much faster than A*.

Related

Artificial Intelligence: Time Complexity of NBA* Search

I am studying informed search algorithms, and for New Bidirectional A* Search, I know that the space complexity is O(b^d), where d is the depth of the shallowest goal node and b is the branch factor. I have tried to find out what its time complexity is, but I haven't been able to find any exact information about it on online resources. Is the exact time complexity of NBA* Search unknown and what is the difference between the original Bidirectional A*? Any insights are appreciated.
If you have specific models of your problem (eg uniformly growing graph in both directions with unit edge costs and the number of states growing exponentially) then most bidirectional search algorithms require O(b^(d/2)) node expansions and require O(b^(d/2)) time. But, this simple model doesn't actual model most real-world problems.
Given this, I would not recommend putting significant effort into studying New Bidirectional A*.
The state of the art in bidirectional search has changed massively in the last few years. The current algorithm with the best theoretical guarantees is NBS - Near-Optimal Bidirectional Search. The algorithm finds optimal paths and is near-optimal in node-expansions. That is, NBS is guaranteed to do no more than 2x more necessary expansions than the best possible algorithm (given reasonable theoretical assumptions such as using the same heuristic). All other algorithms (including A*) can do arbitrarily worse than NBS.
Other algorithm variants of NBS, such as DVCBS, have been proposed which follow the same basic structure, do not have the same guarantees, but perform well in practice.

Why is Goal-directed reasoning and heuristic search hard to combine?

In Artificial Intelligence - A Modern Approach 3rd Edition, I came across an interesting quote stating:
"As yet there is no good understanding of how to combine the two kinds of algorithms [Goal directed reasoning / planning and heuristic search] into a robust and efficient system" (Russel pg 189)
Why is this so? Why is it hard to combine goal oriented planning with a heuristic search? Wouldn't reinforcement learning solve this?
The term “Goal directed reasoning” was used in the 1980s for a backtracking search technique. Sometimes it was called backward reasoning or top-down search, which means all the same. It describes the working of the algorithm in traversing the state space. Or to be more specific: it describes the order in which the states in the graph are visited. In newer literature this aspect of a planner is no longer explained in detail, because a graph search algorithm is no big thing. It means simply to put the nodes on a stack and traverse them.
In contrast, the term “heuristic search” means to replace a brute-force solver with a knowledge based approach. Heuristic search is equal to not traverse a graph, but find a domain-specific strategy which leaves out most part of the graph. And indeed, it is hard to combine backtracking with heuristics, this approach would be called grounding. If a grounded problem is available, it is possible to use a backtracking solver on a knowledge-based problem. This is the strategy utilized in modern PDDL planners which are first describe the domain in a symbolic PDDL notation (which is knowledge based) and using then a fast solver to search in the state space.

How do I efficiently search for a specific sequence in an array?

I'm looking through large arrays for particular sequences and I feel like I'm approaching the problem using brute force rather than computer science.
Currently I'm looking sequentially down the large array for the first item in the search sequence, then checking each item after that until a failure or a complete match. This provides 100% accuracy but it's not very fast with large arrays.
I was never a computer science student so I missed out on many algorithm classes that plenty of people around here probably had. Is there a better way to search for sequences in arrays? I'm not necessarily interested in perfect accuracy if it makes a difference.
How about using the Boyer-Moore algorithm? It's fairly simple and straightforward, and can increase the practical speed quite a lot, especially if your target sequence is fairly long. It's meant for searching for strings, but that's just a particular type of array of course.
There is no better way to search the array itself for candidates for matches. If you have no order you cannot discard candidates as a match or not without considering them.
That being said you can optimize candidate acceptance or rejection utilizing the method suggested by Janne.
If you need to search for many patterns in the same sequence you can use suffix arrays
If you have to search for one pattern then you can improve a little over brute force with Boyer-Moore or Knuth-Morris-Pratt

question about decision trees

after studying decision tree for a while, I noticed there is a small technique called boosting. I see in normal cases, it will improve the accuracy of the decision tree.
So I am just wondering, why don't we just simply incorporate this boosting into every decision tree we built? Since currently we leave boosting out as a separate technique, so I ponder: are there any disadvantages of using boosting than just using a single decision tree?
Thanks for helping me out here!
Boosting is a technique that can go on top any learning algorithm. It is the most effective when the original classifier you built performs just barely above random. If your decision tree is pretty good already, boosting may not make much difference, but have performance penalty -- if you run boosting for 100 iterations you'll have to train and store 100 decision trees.
Usually people do boost with decision stumps (decision trees with just one node) and get results as good as boosting with full decision trees.
I've done some experiments with boosting and found it to be fairly robust, better than single tree classifier, but also slower (I used to 10 iterations), and not as good as some of the simpler learners (to be fair, it was an extremely noisy dataset)
there are several disadvatages for boosting:
1-hard to implement
2-they need extensive training with training sets more than a decision tree does
3- the worst thing is that all boosting algorithms require a Threshold value
which is in most cases not easy to figure out because it requires extensive trial and error tests knowing that the whole performance of the boosting algorithm depends on this threshold

What are the differences between simulated annealing and genetic algorithms?

What are the relevant differences, in terms of performance and use cases, between simulated annealing (with bean search) and genetic algorithms?
I know that SA can be thought as GA where the population size is only one, but I don't know the key difference between the two.
Also, I am trying to think of a situation where SA will outperform GA or GA will outperform SA. Just one simple example which will help me understand will be enough.
Well strictly speaking, these two things--simulated annealing (SA) and genetic algorithms are neither algorithms nor is their purpose 'data mining'.
Both are meta-heuristics--a couple of levels above 'algorithm' on the abstraction scale. In other words, both terms refer to high-level metaphors--one borrowed from metallurgy and the other from evolutionary biology. In the meta-heuristic taxonomy, SA is a single-state method and GA is a population method (in a sub-class along with PSO, ACO, et al, usually referred to as biologically-inspired meta-heuristics).
These two meta-heuristics are used to solve optimization problems, particularly (though not exclusively) in combinatorial optimization (aka constraint-satisfaction programming). Combinatorial optimization refers to optimization by selecting from among a set of discrete items--in other words, there is no continuous function to minimize. The knapsack problem, traveling salesman problem, cutting stock problem--are all combinatorial optimization problems.
The connection to data mining is that the core of many (most?) supervised Machine Learning (ML) algorithms is the solution of an optimization problem--(Multi-Layer Perceptron and Support Vector Machines for instance).
Any solution technique to solve cap problems, regardless of the algorithm, will consist essentially of these steps (which are typically coded as a single block within a recursive loop):
encode the domain-specific details
in a cost function (it's the
step-wise minimization of the value
returned from this function that
constitutes a 'solution' to the c/o
problem);
evaluate the cost function passing
in an initial 'guess' (to begin
iteration);
based on the value returned from the
cost function, generate a subsequent
candidate solution (or more than
one, depending on the
meta-heuristic) to the cost
function;
evaluate each candidate solution by
passing it in an argument set, to
the cost function;
repeat steps (iii) and (iv) until
either some convergence criterion is
satisfied or a maximum number of
iterations is reached.
Meta-heuristics are directed to step (iii) above; hence, SA and GA differ in how they generate candidate solutions for evaluation by the cost function. In other words, that's the place to look to understand how these two meta-heuristics differ.
Informally, the essence of an algorithm directed to solution of combinatorial optimization is how it handles a candidate solution whose value returned from the cost function is worse than the current best candidate solution (the one that returns the lowest value from the cost function). The simplest way for an optimization algorithm to handle such a candidate solution is to reject it outright--that's what the hill climbing algorithm does. But by doing this, simple hill climbing will always miss a better solution separated from the current solution by a hill. Put another way, a sophisticated optimization algorithm has to include a technique for (temporarily) accepting a candidate solution worse than (i.e., uphill from) the current best solution because an even better solution than the current one might lie along a path through that worse solution.
So how do SA and GA generate candidate solutions?
The essence of SA is usually expressed in terms of the probability that a higher-cost candidate solution will be accepted (the entire expression inside the double parenthesis is an exponent:
p = e((-highCost - lowCost)/temperature)
Or in python:
p = pow(math.e, (-hiCost - loCost) / T)
The 'temperature' term is a variable whose value decays during progress of the optimization--and therefore, the probability that SA will accept a worse solution decreases as iteration number increases.
Put another way, when the algorithm begins iterating, T is very large, which as you can see, causes the algorithm to move to every newly created candidate solution, whether better or worse than the current best solution--i.e., it is doing a random walk in the solution space. As iteration number increases (i.e., as the temperature cools) the algorithm's search of the solution space becomes less permissive, until at T = 0, the behavior is identical to a simple hill-climbing algorithm (i.e., only solutions better than the current best solution are accepted).
Genetic Algorithms are very different. For one thing--and this is a big thing--it generates not a single candidate solution but an entire 'population of them'. It works like this: GA calls the cost function on each member (candidate solution) of the population. It then ranks them, from best to worse, ordered by the value returned from the cost function ('best' has the lowest value). From these ranked values (and their corresponding candidate solutions) the next population is created. New members of the population are created in essentially one of three ways. The first is usually referred to as 'elitism' and in practice usually refers to just taking the highest ranked candidate solutions and passing them straight through--unmodified--to the next generation. The other two ways that new members of the population are usually referred to as 'mutation' and 'crossover'. Mutation usually involves a change in one element in a candidate solution vector from the current population to create a solution vector in the new population, e.g., [4, 5, 1, 0, 2] => [4, 5, 2, 0, 2]. The result of the crossover operation is like what would happen if vectors could have sex--i.e., a new child vector whose elements are comprised of some from each of two parents.
So those are the algorithmic differences between GA and SA. What about the differences in performance?
In practice: (my observations are limited to combinatorial optimization problems) GA nearly always beats SA (returns a lower 'best' return value from the cost function--ie, a value close to the solution space's global minimum), but at a higher computation cost. As far as i am aware, the textbooks and technical publications recite the same conclusion on resolution.
but here's the thing: GA is inherently parallelizable; what's more, it's trivial to do so because the individual "search agents" comprising each population do not need to exchange messages--ie, they work independently of each other. Obviously that means GA computation can be distributed, which means in practice, you can get much better results (closer to the global minimum) and better performance (execution speed).
In what circumstances might SA outperform GA? The general scenario i think would be those optimization problems having a small solution space so that the result from SA and GA are practically the same, yet the execution context (e.g., hundreds of similar problems run in batch mode) favors the faster algorithm (which should always be SA).
It is really difficult to compare the two since they were inspired from different domains..
A Genetic Algorithm maintains a population of possible solutions, and at each step, selects pairs of possible solution, combines them (crossover), and applies some random changes (mutation). The algorithm is based the idea of "survival of the fittest" where the selection process is done according to a fitness criteria (usually in optimization problems it is simply the value of the objective function evaluated using the current solution). The crossover is done in hope that two good solutions, when combined, might give even better solution.
On the other hand, Simulated Annealing only tracks one solution in the space of possible solutions, and at each iteration considers whether to move to a neighboring solution or stay in the current one according to some probabilities (which decays over time). This is different from a heuristic search (say greedy search) in that it doesn't suffer from the problems of local optimum since it can get unstuck from cases where all neighboring solutions are worst the current one.
I'm far from an expert on these algorithms, but I'll try and help out.
I think the biggest difference between the two is the idea of crossover in GA and so any example of a learning task that is better suited to GA than SA is going to hinge on what crossover means in that situation and how it is implemented.
The idea of crossover is that you can meaningfully combine two solutions to produce a better one. I think this only makes sense if the solutions to a problem are structured in some way. I could imagine, for example, in multi-class classification taking two (or many) classifiers that are good at classifying a particular class and combining them by voting to make a much better classifier. Another example might be Genetic Programming, where the solution can be expressed as a tree, but I find it hard to come up with a good example where you could combine two programs to create a better one.
I think it's difficult to come up with a compelling case for one over the other because they really are quite similar algorithms, perhaps having been developed from very different starting points.

Resources