Informedness of distance metrics - artificial-intelligence

How would you consider Manhattan distance and Chebyshev in terms of which is more informed and admissable in searching the shortest path in a grid from a to b, where only horizontal and vertical movements are allowed ?

The Manhattan distance is the sum of the distances on the two separate axes, Manhattan = |x_a - x_b| + |y_a - y_b|, whereas the Chebyshev distance is only the maximum of those two: Chebyshev = max(|x_a - x_b|, |y_a - y_b|). So, the Manhattan distance is always at least as big as the Chebyshev distance, typically bigger.
In the case where diagonal movement on a grid is not allowed, both of the distances are admissible (neither of them ever overestimates the true distance).
Given that both distance metrics are always equal to or smaller than the true distance, and the Manhattan distance is always equal to or greater than the Chebyshev distance, the Manhattan distance will always be at least as ''close to the truth''. In other words, the Manhattan distance will be more informative in this specific case.
Note that if diagonal movement is allowed, or if you're not talking about a grid, then the situation can be different.

Related

How can I get the leftover room space?

So this is the task I need help with:
The blueprint of a house is made on a unit square grid sheet. All rooms must be rectangular. So far, N rooms have been drawn on the blueprint. Each room is defined by the upper left and lower right corners. One field of the square grid is given by the x and y coordinates, the coordinates of the upper left field (0,0). The x-coordinates increase horizontally and the y-coordinates increase vertically. The designer wants to calculate how many new rectangular rooms can be added if the two sides of any two new rooms cannot have a common part, and all four sides are adjacent or existing, or the side of the house. The rooms planned so far are such that every free space is rectangular. Make a program that tells you what the largest possible new room area can be in your plan.
Input:
The first line of the standard input has the number of rooms in the design (1≤N≤10,000) and the coordinates of the upper left (FX, FY) and lower right (AX, AY) corners of the house (0 ≤ FX < AX ≤ 1000, 0 ≤ FY < AY ≤ 1000), separated by a space. Each of the following N lines has coordinates (FX ≤ LFXi ≤ RAXi ≤ AX, FY ≤ LFYi ≤ RAYi ≤ AY) of the upper left (LFXi, LFYi) and lower right (RAXi, RAYi) of each room, separated by a space.
Output:
The first line of the standard output should be the area of ​​the largest new room!
Theres an example in the link and its the basic input and output that you can test the program with.
I translated this task from hungarian so the Példa=example , Bemenet=input , Kimenet=output. I would be really happy to get some help with this taks because for me it is a little bit too hard.
Basic input example

Why allowing diagonal movement would make the A* and Manhattan Distance inadmissible?

I'm slightly confused about diagonal movement in a grid using A* and the Manhattan distance metric. Can someone explain why using diagonal movement makes it inadmissible? Wouldn't going in diagonal movement find a better optimal solution as in take less steps to get to goal state than up down left right or am I missing something?
Much as beaker's comment denotes, Manhattan Distance will over estimate the distance between a state and the states diagonally accessible to it. By definition, a heuristic that over estimates distances is not admissible.
Now, why exactly is this so?
Lets assume your Manhattan Distance procedure looks something like this:
function manhattan_dist(state):
y_dist = abs(state.y - goal.y)
x_dist = abs(state.x - goal.x)
return (y_dist + x_dist)
Now, consider the case of applying that procedure to the state of (1,1), and assume the goal is at (3,3). This will return the value of 4, which over estimates the actual distance which is 2. Therefore, Manhattan Distance in this situation will not work as an admissible heuristic.
On game boards that allow for diagonal movement Chebyshev Distance is typically used instead. Why?
Consider this new procedure:
function chebyshev dist(state):
y_dist = abs(state.y - goal.y)
x_dist = abs(state.x - goal.x)
return max(y_dist, x_dist)
Returning to the previous example of (1,1) and (3,3), this procedure will return the value of 2, which is indeed not an overestimation of the actual distance.
While this topic is older I would like to add a different solution that uses the actual fastest free path to the goal if diagonal movement is allowed.
function heuristic(state):
delta_x = abs(state.x - goal.x)
delta_y = abs(state.y - goal.y)
return min(delta_x, delta_y) * sqrt(2) + abs(delta_x - delta_y)
This method returns a heuristic that moves the maximum amount diagonally and the remainder in a straight way to the goal and presents the largest possible heuristic that does not over-estimate the movement costs to the goal.

Simple explanation of PCA to reduce dimensionality of dataset

I know that PCA does not tell you which features of a dataset are the most significant, but which combinations of features keep the most variance.
How could you use the fact that PCA rotates the dataset in such a way that it has the most variance along the first dimension, second most along second, and so on to reduce the dimensionality of the dataset?
I mean, more in depth, How are the first N eigenvectors used to transform the feature vectors into a lower-dimensional representation that keeps most of the variance?
Let X be an N x d matrix where each row X_{n,:} is a vector from the dataset.
Then X'X is the covariance matrix and an eigen decomposition gives X'X=UDU' where U is a d x d matrix of eigenvectors with U'U=I and D is a d x d diagonal matrix of eigenvalues.
The form of the eigendecomposition means that U'X'XU=U'UDU'U=D which means that if you transform your dataset by U then the new dataset, XU, will have a diagonal covariance matrix.
If the eigenvalues are ordered from largest to smallest, this also means that the average squared value of the first transformed feature (given by the expression U_1'X'XU_1=\sum_n (\sum_d U_{1,d} X_{n,d})^2) will be larger that the second, the second larger than the third, etc.
If we order the features of a dataset from largest to smallest average value, then if we just get rid of the features with small average values (and the relative sizes of the large average values are much larger than the small ones), then we haven't lost much information. That is the concept.

Implementing Geometric Median

When I google for Geometric median, I got this link Geometric median
but I have no clue how to implement it in C . I am not very good at understanding this Mathematical Explanation. Lets Say I have 11 pair of co-ordinates how will I calculate the geometric median for the same.
I am trying to solve this problem Grid CIty. I was given a Hint that geometric median will help me achieve it. I am not looking for a final solution. If someone can guide me to a right path that would help.
Thanks is Advance
Below is the list of co-ordinates a (test case). result : 3 4
1 2
1 7
2 2
2 3
2 5
3 4
4 2
4 5
4 6
5 3
6 5
I don't think this is solvable without an iterative algorithm.
Here is a pseudocode solution similar to the hill-climbing version, except that it works to arbitrary accuracy, and in higher dimensions.
CurrentPoint = Mean(Points)
While (CurrentPoint - PreviousPoint) Length > 0.01 Do
For Each Point in Points Do
Vector = CurrentPoint - Point
Vector Length = Vector Length - 1.0
Point2 = Point + Vector
Add Point2 To Points2
Loop
PreviousPoint = CurrentPoint
CurrentPoint = Mean(Points2)
Loop
Notes:
The constant 0.01 does not guarantee the result to be within 0.01 of the true value. Use smaller values for better precision.
The constant 1.0 should be adjusted to (I'm guessing) about 1/5 the distance between the furthest points. Too small values will slow down the algorithm, but too large values will cause inaccuracies probably leading an to infinite loop.
To resolve this problem, you just have to compute the mean for each coordinate and round up the result.
It should resolve your problem.
You are not obliged to use the concept of Geometric median; so seeing that it is not easy to calculate, you better solve your problem without calculating it!
Here is an idea for an algorithm/implementation.
Start at any point (e.g. the first point in the given data).
Calculate the sum of distances for current point and the 8 neighboring points (+/-1 in each direction, x and y)
If one of the neighbors is better than current point, update the current point and start from 1
(Found the optimal distance; now choose the best point among those with equal distance)
Calculate the sum of distances for current point and the 3 neighboring points (-1 in each direction, x and y)
If one of the neighbors is the same as current point, update the current point and continue from 5
The answer is (xi, yj) where xi
is the median of all the x's and yj is the median of all the y's.
As I comment the solution to your problem is not the geometric mean, but the arithmetic mean.
If you have to calculate the arithmetic mean, you need to sum all the values of the column and divide the answer by the number of elements.

Dividing circle into pieces by choosing points on circumference?

On a circle, N arbitrary points are chosen on its circumference. The complete graph formed with those N points would divide the area of the circle into many pieces.
What is the maximum number of pieces of area that the circle will get divided into when the points are chosen along its circumference?
Examples:
2 points => 2 pieces
4 points => 8 pieces
Any ideas how to go about this?
This is known as Moser's circle problem.
The solution is:
i.e.
The proof is quite simple:
Consider each intersection inside the circle. It must be defined by the intersection of two lines, and each line has two points, so every intersection inside the circle defines 4 unique sets of points on the circumference. Therefore, there are at most n choose 4 inner vertices, and obviously there are n vertices on the circumference.
Now, how many edges does each vertex touch? Well, it's a complete graph, so each vertex on the outside touches n - 1 edges, and of course each vertex on the inside touches 4 edges. So the number of edges is given by (n(n - 1) + 4(n choose 4))/2 (we divide by two because otherwise each edge would be counted twice by its two vertices).
The final step is to use Euler's formula for the number of faces in a graph, i.e.: v - e + f = 1 (the Euler characteristic is 1 in our case).
Solving for f gives the formulae above :-)

Resources