Should the amount of pheromone deposited on edges depend on the distance between the nodes in ACO? - artificial-intelligence

I was testing an ACO I built and I noticed that if I used the same graph but if I used meters instead of centimeters when calculating tour distance the amount of pheromone my AC deposits changes. This seems wrong, but after reading the literature I'm not sure what I'm missing.
For example, here is the pheromone update calculation in meters:
Tour distance = 5m
extraPheromone = 0.2 = 1 / 5m
Then in CM:
Tour distance = 500cm
extraPheromone = 0.002 = 1 / 500cm
It also applies when scaling the graph up. If all distances were doubled you would expect the ACO to perform the same, however again this would change the pheromone update calculation and affect the balance between the tour edge distance and pheromone count heuristic used when selecting a tour edge.
Any thoughts on how to solve this?

The pheromone levels are just a relative measure to describe how favourable any single edge is. In ACO, tour selection is not heuristic, but probabilistic.
Consider some ant, during the construction of any tour, say S, where the ant is just about to leave node i. As next node, the ant will choose node j (i.e., edge e_ij) with probability
p(e_ij|S) = (tau_ij^alpha * nu_ij^beta) /
sum(k in allNotYetVisitedNodes) {
(tau_ik^alpha * nu_ik^beta) }.
Naturally, this expression is unit-less, and any scaling of (all) node distances will not affect these probabilities. The result of the algorithm does not require any specific length unit in the graph defining the nodes; a graph defined in meters will produce a best tour in meters, and one in arbitrary length units le will produce a result in le:s.

Related

Efficient way of calculating minimum distance between point and multiple faces

I have multiple faces in 3D space creating cells. All these faces lie within a predefined cube (e.g. of size 100x100x100).
Every face is convex and defined by a set of corner points and a normal vector. Every cell is convex. The cells are result of 3d voronoi tessellation, and I know the initial seed points of the cells.
Now for every integer coordinate I want the smallest distance to any face.
My current solution uses this answer https://math.stackexchange.com/questions/544946/determine-if-projection-of-3d-point-onto-plane-is-within-a-triangle/544947 and calculates for every point for every face for every possible triple of this faces points the projection of the point to the triangle created by the triple, checks if the projection is inside the triangle. If this is the case I return the distance between projection and original point. If not I calculate the distance from the point to every possible line segment defined by two points of a face. Then I choose the smallest distance. I repeat this for every point.
This is quite slow and clumsy. I would much rather calculate all points that lie on (or almost lie on) a face and then with these calculate the smallest distance to all neighbour points and repeat this.
I have found this Get all points within a Triangle but am not sure how to apply it to 3D space.
Are there any techniques or algorithms to do this efficiently?
Since we're working with a Voronoi tessellation, we can simplify the current algorithm. Given a grid point p, it belongs to the cell of some site q. Take the minimum over each neighboring site r of the distance from p to the plane that is the perpendicular bisector of qr. We don't need to worry whether the closest point s on the plane belongs to the face between q and r; if not, the segment ps intersects some other face of the cell, which is necessarily closer.
Actually it doesn't even matter if we loop r over some sites that are not neighbors. So if you don't have access to a point location subroutine, or it's slow, we can use a fast nearest neighbors algorithm. Given the grid point p, we know that q is the closest site. Find the second closest site r and compute the distance d(p, bisector(qr)) as above. Now we can prune the sites that are too far away from q (for every other site s, we have d(p, bisector(qs)) ≥ d(q, s)/2 − d(p, q), so we can prune s unless d(q, s) ≤ 2 (d(p, bisector(qr)) + d(p, q))) and keep going until we have either considered or pruned every other site. To do pruning in the best possible way requires access to the guts of the nearest neighbor algorithm; I know that it slots right into the best-first depth-first search of a kd-tree or a cover tree.

Generating a set of N random convex disjoint 2D polygons with at most V vertices, and two additional points?

I want to create a set of N random convex disjoint polygons on a plane where each polygon must have at most V vertices, where N and V are parameters for my function, and I'd like to obtain a distribution as close as possible to uniform (every possible set being equally probable). Also I need to randomly select two points on the plane that either match with one of the vertices in the scene or are in empty space (not inside a polygon).
I already implemented for other reasons in the same programming language an AABB tree, Separating Axis Theorem-based collision detection between convex polygons and I can generate a random convex polygon with arbitrary amount of vertices inside a circle of given radius. My best bet thus far:
Generate a random polygon using the function I have available.
Query the AABB tree to test for interception with existing polygons.
If the AABB tree query returns empty set I push the generated polygon into it, otherwise I test with SAT against all the other polygons whose AABB overlaps with the generated one's. If SAT returns "no intersection" I push the polygon, otherwise I discard it.
Repeat from 1 until N polygons are generated.
Generate a random number in {0,1}
If the generated number is 1 I pick a random polygon and a random vertex on it as a point
If the generated number is 0 I generate a random position in (x,y) and test if it falls within some polygon (I might create a tiny AABB around it and exploit the AABB tree to reduce the required number of PiP tests). In case it's not inside any polygon I approve it as a valid point, otherwise I repeat from 5.
Repeat from 5 once more to get the second point.
I think the solution would possibly work, but unfortunately there's no way to guarantee that I can generate N such polygons for very large N, or find two good points in an acceptable time, and I'm programming in React, where long operations run on the main thread blocking the UI till they end. I could circumvent the issue by ejecting from create-react-app and learn Web Workers, which would require probably more time than it's worth for me.
This is definitely non-uniform distribution, but perhaps you could begin by generating N points in the plane and then computing the Voronoi diagram for those points. The Voronoi diagram can be computed in O(n log n) time with Fortune's algorithm. The cells of the Voronoi diagram are convex, so you can then construct a random polygon of the desired number of vertices that lies within each cell of the diagram.
By Balu Ertl - Own work, CC BY-SA 4.0, Link
Ok, here is another proposal. I have little knowledge of js, but could cook up something in Python.
Use Poisson disk sampling with distance parameter d to generate N samples of the centers
For a given center make a circle with R≤d.
Generate V angles using Dirichlet distribution such that sum of angles is equal to 2π. Sort them.
Place vertices on the circle using angles generate at step#3 and connect them. This would be be your polygon
UPDATE
Instead using Poisson disk sampling for step 1, one could use Sobol quasi-random sequences. For N points sampled in the 1x1 box (well, you have to scale it afterwards), least distance between points would be
d = 0.5 * sqrt(D) / N,
where D is dimension of the problem, 2 in your case. So radius of the circle for step 2 would be 0.25 * sqrt(2) / N. This ties nicely N and d.
https://www.sciencedirect.com/science/article/abs/pii/S0378475406002382

creating a cost function in jgrapht

jgrapht supports the idea of putting a wehight(a cost) on an edge/vertex between two nodes. This can be achieved using the class DefaultWeightedEdge.
In my graph I do have the requirement to not find the shortest path but the cheapest one. The cheapest path might be longer/have more hops nodes to travel then the shortest path.
Therefor, one can use the DijkstraShortestPath algorithm to achieve this.
However, my use case is a bit more complex: It needs to also evaluate costs on actions that need to be executed when arriving at a node.
Let's say, you have a graph like a chess board(8x8 fields, each field beeing a node). All the edges have a weight of 1. To move in a car from left bottom to the diagonal corner(right upper), there are many paths with the cost of 16. You can take a diagonal path in a zic zac style, or you can first travel all nodes to the right and then all nodes upwards.
The difference is: When taking a zic zac, you need to rotate yourself in the direction of moving. You rotate 16 times.
When moving first all to the right and then upwards, you need to rotate only once (maybe twice, depending on your start orientation).
So the zic zac path is, from a Djikstra point of view, perfect. From a logical point of view, it's the worst.
Long story short: How can I put some costs on a node or edge depending on the previous edge/node in that path? I did not find anything related in the source code of jgrapht.
Or is there a better algorithm to use?
This is not a JGraphT issue but a graph algorithm issue. You need to think about how to encode this problem and formalize that in more detail
Incorporating weights on vertices is in general easy. Say that every vertex represents visiting a customer, which takes a_i time. This can be encoded in the graph by adding a_i/2 to the cost of every incoming arc in node i, as well as a_i/2 to the cost of every outgoing arc.
A cost function where the cost of traveling from j to k dependents on the arc (i,j) you used to travel to j is more complicated.
Approach a.: Use a dynamic programming (labeling) algorithm. This is perhaps the easiest. You can define your cost function as a recursive function, where the cost of traversing an arc depends on the cost of the previous arc.
Approach b.: With some tricks you may be able to encode the costs in the graph by adding extra nodes to it. Here's an example:
Given a graph with vertices {a,b,c,d,e}, with arcs: (a,e), (e,b), (c,e), (e,d). This graph represents a crossroad with vertex e being in the middle. Going from a->e->b (straight) is free, however, a turn from a->e->d takes additional time. Similar for c->e->d (straight) is free and c->e->b (turning) should be penalized.
Decouple vertex e in 4 new vertices: e1,e2,e3,e4.
Add the following arcs:
(a,e1), (e3,b), (c,e2), (e4,d), (e2, e3), (e1, e3), (e1, e4), (e2, e4).
(e1,e4) and (e2,e3) can have a positive weight to penalize turning.

Monte Carlo Tree Search for card games like Belot and Bridge, and so on

I've been trying to apply MCTS in card games. Basically, I need a formula or modify the UCB formula, so it is best when selecting which node to proceed.
The problem is, the card games are no win/loss games, they have score distribution in each node, like 158:102 for example. We have 2 teams, so basically it is 2-player game. The games I'm testing are constant sum games (number of tricks, or some score from the taken tricks and so on).
Let's say the maximum sum of teamA and teamB score is 260 at each leaf. Then I search the best move from the root, and the first I try gives me average 250 after 10 tries. I have 3 more possible moves, that had never been tested. Because 250 is too close to the maximum score, the regret factor to test another move is very high, but, what should be mathematically proven to be the optimal formula that gives you which move to chose when you have:
Xm - average score for move m
Nm - number of tries for move m
MAX - maximum score that can be made
MIN - minimum score that can be made
Obviously the more you try the same move, the more you want to try the other moves, but the more close you are to the maximum score, the less you want to try others. What is the best math way to choose a move based ot these factors Xm, Nm, MAX, MIN?
Your problem obviously is an exploration problem, and the problem is that with Upper Confidence Bound (UCB), the exploration cannot be tuned directly. This can be solved by adding an exploration constant.
The Upper Confidence Bound (UCB) is calculated as follows:
with V being the value function (expected score) which you are trying to optimize, s the state you are in (the cards in the hands), and a the action (putting a card for example). And n(s) is the number of times a state s has been used in the Monte Carlo simulations, and n(s,a) the same for the combination of s and action a.
The left part (V(s,a)) is used to exploit knowledge of the previously obtained scores, and the right part is the adds a value to increase exploration. However there is not way to increase/decrease this exploration value, and this is done in the Upper Confidence Bounds for Trees (UCT):
Here Cp > 0 is the exploration constant, which can be used to tune the exploration. It was shown that:
holds the Hoeffding's inequality if the rewards (scores) are between 0 and 1 (in [0,1]).
Silver & Veness propose: Cp = Rhi - Rlo, with Rhi being the highest value returned using Cp=0, and Rlo the lowest value during the roll outs (i.e. when you randomly choose actions when no value function is calculated yet).
Reference:
Cameron Browne, Edward J. Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton.
A Survey of Monte Carlo Tree Search Methods.
IEEE Trans. Comp. Intell. AI Games, 4(1):1–43, 2012.
Silver, D., & Veness, J. (2010). Monte-Carlo planning in large POMDPs. Advances in Neural Information Processing Systems, 1–9.

How come random weight initiation is better then just using 0 as weights in ANN?

In a trained neural net the weight distribution will fall close around zero. So it makes sense for me to initiate all weights to zero. However there are methods such as random assignment for -1 to 1 and Nguyen-Widrow that outperformes zero initiation. How come these random methods are better then just using zero?
Activation & learning:
Additionally to the things cr0ss said, in a normal MLP (for example) the activation of layer n+1 is the dot product of the output of layer n and the weights between layer n and n + 1...so basically you get this equation for the activation a of neuron i in layer n:
Where w is the weight of the connection between neuron j (parent layer n-1) to current neuron i (current layer n), o is the output of neuron j (parent layer) and b is the bias of current neuron i in the current layer.
It is easy to see initializing weights with zero would practically "deactivate" the weights because weights by output of parent layer would equal zero, therefore (in the first learning steps) your input data would not be recognized, the data would be negclected totally.
So the learning would only have the data supplied by the bias in the first epochs.
This would obviously render the learning more challenging for the network and enlarge the needed epochs to learn heavily.
Initialization should be optimized for your problem:
Initializing your weights with a distribution of random floats with -1 <= w <= 1 is the most typical initialization, because overall (if you do not analyze your problem / domain you are working on) this guarantees some weights to be relatively good right from the start. Besides, other neurons co-adapting to each other happens faster with fixed initialization and random initialization ensures better learning.
However -1 <= w <= 1 for initialization is not optimal for every problem. For example: biological neural networks do not have negative outputs, so weights should be positive when you try to imitate biological networks. Furthermore, e.g. in image processing, most neurons have either a fairly high output or send nearly nothing. Considering this, it is often a good idea to initialize weights between something like 0.2 <= w <= 1, sometimes even 0.5 <= w <= 2 showed good results (e.g. in dark images).
So the needed epochs to learn a problem properly is not only dependent on the layers, their connectivity, the transfer functions and learning rules and so on but also to the initialization of your weights.
You should try several configurations. In most situations you can figure out what solutions are adequate (like higher, positive weights for processing dark images).
Reading the Nguyen article, I'd say it is because when you assign the weight from -1 to 1, you are already defining a "direction" for the weight, and it will learn if the direction is correct and it's magnitude to go or not the other way.
If you assign all the weights to zero (in a MLP neural network), you don't know which direction it might go to. Zero is a neutral number.
Therefore, if you assign a small value to the node's weight, the network will learn faster.
Read Picking initial weights to speed training section of the article. It states:
First, the elements of Wi are assigned values from a uniform random distributation between -1 and 1 so that its direction is random. Next, we adjust the magnitude of the weight vectors Wi, so that each hidden node is linear over only a small interval.
Hope it helps.

Resources