is this classification result acceptable? - artificial-intelligence

I have a very simple linear classification problem,which is to work out a linear classification problem for the following three classes in coordinates:
Class 1: points (0,1) (1,0)
Class 2: points (-1,0) (1,0)
Class 3: points (0,-1) (1,-1)
I manually used a random initial weight [ 1 0,0 1] (2*2 matrix) and a random initial bias
[1,1] by applying each iteration on the six samples,I finally get a classification which is X=-1 and Y=-1,so when x and Y are both >-1,it is class1;
if X<=-1 and Y>-1,it is class2;
if x >-1 and Y <=-1,it is class3.
After plotting this on the graph,I think it has some problems since the decision boundary cross samples in class2 and class3,I wonder if that is acceptable.By observing the graph,I would say the ideal classification would be x =-1/2 and y=1/2,but I really cannot get that result after calculation.
Please kindly share your thoughts with me,thanks in advance.

I'd say the results are acceptable. All the points are correctly classified except for the point at (1,0) that is labelled as class 2 and classified as class 1. The problem is that there is also a point at (1,0) labelled as class 1, so it's impossible to separate classes 1 and 2.
Of course, the model is quite probably awful when evaluated on a test set. If you want the decision boundaries to be placed equidistant between points, you need to look at max margin classifiers.

The results are not acceptable. Class 2 and 3 are linearly separable, so you shouldn't accept any classifier that doesn't classify them perfectly.
As far as I know, with these samples and a feed-forward network trained with backpropagation, you are unlikely to get your desired x=-1/2 and y=1/2. You need a maximum margin classifier for that.
I recommend you to check a SVM linear classifier. You can check SVMlight for multiclass problems.

Related

How to obtain the derivative of Rodrigues vector and perform update in nonlinear least square?

I am now interested in the bundle adjustment in SLAM, where the Rodrigues vectors $R$ of dimension 3 are used as part of variables. Assume, without loss of generality, we use Gauss-Newton method to solve it, then in each step we need to solve the following linear least square problem:
$$J(x_k)\Delta x = -F(x_k),$$
where $J$ is the Jacobi of $F$.
Here I am wondering how to calculate the derivative $\frac{\partial F}{\partial R}$. Is it just like the ordinary Jacobi in mathematic analysis? I have this wondering because when I look for papers, I find many other concepts like exponential map, quaternions, Lie group and Lie algebra. So I suspect if there is any misunderstanding.
This is not an answer, but is too long for a comment.
I think you need to give more information about how the Rodrigues vector appears in your F.
First off, is the vector assumed to be of unit length.? If so that presents some difficulties as now it doesn't have 3 independent components. If you know that the vector will lie in some region (eg that it's z component will always be positive), you can work round this.
If instead the vector is normalised before use, then while you could then compute the derivatives, the resulting Jacobian will be singular.
Another approach is to use the length of the vector as the angle through which you rotate. However this means you need a special case to get a rotation through 0, and the resulting function is not differentiable at 0. Of course if this can never occur, you may be ok.

Do variables in Bayesian Networks have to be Boolean?

I can't believe I can't find any information on this, but do variables in Bayesian Networks have to be boolean? Every example I've found in my textbook or online uses T/F variables, but how do I represent a variable that has more than two possible values in a Bayesian network?
For example, I was given the following problem:
We have a bag of three biased coins a, b, and c with probabilities of coming up heads of 20%, 60%,
and 80%, respectively. One coin is drawn randomly from the bag (with equal likelihood of drawing
each of the three coins), and then the coin is flipped three times to generate the outcomes X1, X2, and
X3.
Draw the Bayesian network corresponding to this setup and define the necessary CPTs (Conditional
Probability Table).
Can anyone help point me in a direction to get started with this?
Bayesian networks support variables that have more than two possible values. Koller and Friedman's "Probabilistic Graphical Models" has examples with larger variable domains.
Usually BNs have discrete random variables (with a finite number of different values). But it's also possible to define them with either countably infinite, or continuous variables. In the latter case, the inference algorithms change considerably, though.
Now that I tried finding some examples online, I have to admit you're correct. They're hard to find. Here is an example that is taken from above book. The variable Grade can take on three different values.
Excellent question. Someone already pointed you in the right direction in terms of the specific homework problem, so I won't repeat that; I'll try to add some intuition that might be helpful.
The intuition you need here is that a Bayesian network is nothing more than a visual (graphical) way of representing a set of conditional independence assumptions.
So, for example, if X and Z are conditionally independent variables given Y, then you could draw the Bayesian network X → Y → Z. And conversely, the one and only thing that the Bayes net X → Y → Z tells you is that there are three variables (X, Y, Z) and that X and Z are conditionally independent given Y.
Once you understand this, then you realize that anything you could write a conditional independence assumption for, you can draw a Bayes net for, and vice-versa.
i.e., they need not be Boolean at all.
Usually Bayesian Networks are modeled with discrete values for each node, and when these values are known (or values get set by the modeler) then people say a probability distibution factorizes over these values.
I think theoretical frameworks for bayesian networks with continuous values also exist, but they are mathematically more difficult than discrete (maybe only suited for PhDs?)
Furthermore I cannot solve your problem off the top of my head, but maybe
try this in R:
library(dplyr) # loads mutate(), %>% (pipe operator)
Model <- c("Coin a", "Coin b", "Coin c")
Prior <- c(0.2, 0.6, 0.8)
Likelihood <- c(1/3, 1/3, 1/3)
bayes_df <- data.frame(Model=Model, Prior=Prior, Likelihood=Likelihood)
# posterior probabilities
bayes_df %>%
mutate(Product = Likelihood * Prior, Posterior = Product/sum(Product))
Result
Model Prior Likelihood Product Posterior
1 Coin a 0.2 0.3333 0.06667 0.125
2 Coin b 0.6 0.3333 0.20000 0.375
3 Coin c 0.8 0.3333 0.26667 0.500
I think the "network" is just 2 bubbles connected with an arrow coin -> pick and the CPT is the numbers from above, but I'm not sure.

Optimal values of x to get smoothest curve f(x) given N points

Suppose I have a function, y = x^2, and I'm allowed to plot 10 points between -1 and 1. Which values of x should I choose to have the smoothest curve?
Is there a standard way to do this? Clearly you'll have more points near x = 0. I'm guessing I need to consider the second derivative here.
More precisely, you need to consider the curvature of the curve. Since we will require second derivative in order to compute the curvature, so what you said about 'considering the second derivative' is in the right direction.
For your curve y=x^2, the curvature is
2
-----------
(1+4x^2)^1.5
This means that the curvature attains its maximum value 2.0 at x=0.0 and will become smaller as |x| gets bigger. So, you do need to have more points around x=0.0. From my experience, if you are able to sample the points along the curve so that they have equal arc length in between each two points, the resulting polyline will be a good approximation to the original curve. However, I am not sure whether it will be the 'smoothest' or not.

Proper Heuristic Mechanism For Hill Climbing

The following problem is an exam exercise I found from an Artificial Intelligence course.
"Suggest a heuristic mechanism that allows this problem to be solved, using the Hill-Climbing algorithm. (S=Start point, F=Final point/goal). No diagonal movement is allowed."
Since it's obvious that Manhattan Distance or Euclidean Distance will send the robot at (3,4) and no backtracking is allowed, what is a possible solution (heuristic mechanism) to this problem?
EDIT: To make the problem clearer, I've marked some of the Manhattan distances on the board:
It would be obvious that, using Manhattan distance, the robot's next move would be at (3,4) since it has a heuristic value of 2 - HC will choose that and get stuck forever. The aim is try and never go that path by finding the proper heuristic algorithm.
I thought of the obstructions as being hot, and that heat rises. I make the net cost of a cell the sum of the Manhattan metric distance to F plus a heat-penalty. Thus there is an attractive force drawing the robot towards F as well as a repelling force which forces it away from the obstructions.
There are two types of heat penalties:
1) It is very bad to touch an obstruction. Look at the 2 or 3 cells neighboring cells in the row immediately below a given cell. Add 15 for every obstruction cell which is directly below the given cell and 10 for every diagonal neighbor which is directly below
2) For cells not in direct contact with the instructions -- the heat is more diffuse. I calculate it as 6 times the average number of obstruction blocks below the cell both in its column and in its neighboring columns.
The following shows the result of combining this all, as well as the path taken from S to F:
A crucial point it the way that the averaging causes the robot to turn left rather than right when it hits the top row. The unheated columns towards the left make that the cooler direction. It is interesting to note how all cells (with the possible exception of the two at the upper-right corner) are drawn to F by this heuristic.

Compact representation of OpenGL modelview matrix 4x4

What is the most user friendly way to store only the rotation part of an OGL modelview (4x4) matrix?
For example; in a level editor to set the rotation for an object it would be easy to use the XYZ Euler angles. However this seems a very tricky system to use with matrices.
I need to be able to get AND set the rotation from this new representation.
(The alternative is to store the rotation part (4*3 numbers) but it is hard for a user to manipulate these)
I found some code here http://www.google.com/codesearch/p?hl=en#HQY9Wd_snmY/mesh/matrix3.h&q=matrix3&sa=N&cd=1&ct=rc that allows me to set and get rotation from angles (3 floats). This is ideal.
Although they're used regularily, I disregard the use of Euler angles. They're problematic as they only preserve the pointing direction of the object, but not the bitangent to that direction. More important: They're prone to gibal lock http://en.wikipedia.org/wiki/Gimbal_lock
A far superior method for storing rotations are Quarternions. In layman terms a quaternion consists of the rotational axis and the angle of rotation around this axis. It is thus a tuple of 4 scalars a,b,c,d. The quaternion is then Q = a + i*b + j*c + k*d, |Q| = 1, with the special properties of i,j,k that i² = j² = k² = i·j·k = -1 and i·j = k, j·k = i, k·i = j, which implies j·i = -k, k·j = -i, i·k = -j
Quaterions are thus extensions of complex numbers. If you recall complex number theory, you'll remember that the product of two complex numbers a =/= b with |a| = |b| = 1 is a rotation in the complex plane. It is thus easy to assume that rotations in 3D can be described by an extension of complex numbers into a complex hyperplane. This is what quaternions are.
See this article on the details.
http://en.wikipedia.org/wiki/Quaternions_and_spatial_rotation
In a standard 3D matrix you only need the top left 3x3 values to give the rotation. To apply the matrix as a 4x4 later on, you need to make the other values 0 apart from on the diagonal.
Here's a rotation only matrix where the values vXY give the rotations.
[v00 v01 v02 0]
[v10 v11 v12 0]
[v20 v21 v22 0]
[ 0 0 0 1]
Interestingly, the values form the bases of the coordinate system you have rotated the object into, so in the new system, the x-axis is along [v00 v01 v02], the y-axis is along [v10 v11 v12] and the z-axis obviously [v20 v21 v22].
You could show these axes beside the object and let the other drag them around to change the rotation, perhaps.
I would say this depends on the user, but to me the most "user friendly" way is to store "roll", "pitch" and "yaw". These are very non-technical terms that an average user can understand and adjust, and it should be easy for you to take these values and compute the matrix.
IMO, the most 'user friendly format' for rotation is storing Euler XYZ angles, this is generally how rotations are exposed in any 3d content creation software.
Euler angles are easy to transform to matrices, see here for the involved matrix product.
But you should not confuse the format given to the GUI/user and the storage format of the data: Euler XYZ angles have problems of their own when doing animation, gimbal lock can introduce unwanted behaviour.
Another candidate for storing/computing rotations is quaternions. They offer mathematical advantages over XYZ angles, essentially when interpolating between two rotations. Of course, you don't want to expose the quaternion values directly to any human user, you'll need to convert them to XYZ angles. You'll find plenty of code to that on the Web.
I would not recommend storing the rotation directly in matrix format. Extracting user friendly values from it is difficult, it does not offer any interesting behaviour for animation/interpolation, it takes for storage. IMO, matrices are to be created when needed to transform the geometry.
To conclude, there are a few options, you should select what suits you most. Do you plan to having animation or not ? etc.
EDIT
Also, you should not make an amalgam with model and view matrices. They are semantically very different, and are combined in OpenGL only for performance reasons. What I had in mind above in the 'model matrix'. The view matrix is generally given by your view system/camera manager, and is combined with you model matrix.
A quaternion is, although the math is "obscure and unintellegible" surprisingly user friendly, as it represents rotation around an axis by a given angle.
The axis of rotation is simply a unit vector pointing in that direction, multiplied by the sine of 1/2 the rotation angle, and the "obscure" 4th component equals the cosine of 1/2 the rotation angle.
It feels kind of "unnatural" at first sight, but once you grasp it... can it be any easier?

Resources