I have a question regarding the Naïve Bayes classification method.
I ran though what I thought was an easy example but ran into a snag.
Basically here is the classification I would like to do:
I want to be able to take some training data:
input1 | input2 | input3 | class
1 3 3 1
2 1 1 2
1 1 1 3
3 3 3 1
and classify them into a class 1-3.
As I understand it first you compute the prior probability of
the class so in this case that would be
class 1 = P(c_1) = 0.50
class 2 = P(c_2) = 0.25
class 3 = P(c_3) = 0.25
which thusfar makes perfect sense. They all add to 1 and its
very easy to see where those numbers come from.
So due to the numerical nature of those values I wanted to simplify
them into ranges. So I reconstructed my data into this:
So anyways that how I got to that table. Continuing with the Bayes part:
P(Class 1 | avg_speed_1): 0.5
P(Class 1 | avg_speed_2): 0
P(Class 1 | avg_speed_3): 0
P(Class 2 | avg_speed_1): 0
P(Class 2 | avg_speed_2): 0.25
P(Class 2 | avg_speed_3): 0
P(Class 3 | avg_speed_1): 0
P(Class 3 | avg_speed_2): 0
P(Class 3 | avg_speed_3): 0.25
P(Class 1 | avg_distance_1): 0.5
P(Class 1 | avg_distance_2): 0
P(Class 1 | avg_distance_3): 0
P(Class 2 | avg_distance_1): 0
P(Class 2 | avg_distance_2): 0.25
P(Class 2 | avg_distance_3): 0
P(Class 3 | avg_distance_1): 0
P(Class 3 | avg_distance_2): 0
P(Class 3 | avg_distance_3): 0.25
P(Class 1 | avg_elev_gain_1): 0.5
P(Class 1 | avg_elev_gain_2): 0
P(Class 1 | avg_elev_gain_3): 0
P(Class 2 | avg_elev_gain_1): 0
P(Class 2 | avg_elev_gain_2): 0
P(Class 2 | avg_elev_gain_3): 0
P(Class 3 | avg_elev_gain_1): 0
P(Class 3 | avg_elev_gain_2): 0
P(Class 3 | avg_elev_gain_3): 0.5
now this all still makes sense to me. each class still adds to 1 however
when I go to compute the probability for each class, the 0's screw up the calculation
take the first class for example:
P(Class 1 | avg_speed_1) *
P(Class 1 | avg_speed_2) *
P(Class 1 | avg_speed_3) *
P(Class 1 | avg_distance_1) *
P(Class 1 | avg_distance_2) *
P(Class 1 | avg_distance_3) *
P(Class 1 | avg_elev_gain_1) *
P(Class 1 | avg_elev_gain_2) *
P(Class 1 | avg_elev_gain_3) *
P(Class 1) = 0
I've found that this always equals zero because there are a number of
input elements that still zero! Where did I go wrong?!? Does this mean that I have insufficient training data?
That being said is the Naïve Bayes approach even the right way to approach this classification?
Any thoughts would be greatly appreciated
Related
For each row in a table, I want to find the minimum value across a couple of numeric columns, then take the name of that column (which holds the desired value) and populate a new column with the name (or custom string).
A few rules first in my specific scenario: the value to be found across the columns must also be > 0. Also, if no value in the column is > 0, then a custom string should be placed (ie. 'none').
For example, take this table below with columns alpha to delta storing the values:
id | alpha | bravo | charlie | delta
------+--------+--------+---------+--------
1 | 5 | 2.3 | -1 | -5
2 | 9 | 8 | 3 | 1
3 | -1 | -4 | -7 | -9
4 | 6.1 | 4 | 3.9 | 0
for each row, I want to find out which column holds the lowest positive value. My expected output is something like this:
id | alpha | bravo | charlie | delta | lowest_postive
------+--------+--------+---------+--------+---------------
1 | 5 | 2.3 | -1 | -5 | 'col: bravo'
2 | 9 | 8 | 3 | 1 | 'col: delta'
3 | -1 | -4 | -7 | -9 | 'col: none'
4 | 6.1 | 4 | 3.9 | 0 | 'col: charlie'
Should I use a CASE ... WHEN ... THEN ...? Should I be converting the row into an array first, then assinging each position in the array?
You can do:
select *,
case when mp = alpha then 'col: alpha'
when mp = bravo then 'col: bravo'
when mp = charlie then 'col: charlie'
when mp = delta then 'col: delta'
end as lower_positive
from (
select *,
least(
case when alpha > 0 then alpha end,
case when bravo > 0 then bravo end,
case when charlie > 0 then charlie end,
case when delta > 0 then delta end
) as mp
from t
) x
However, this solution doesn't account for multiple minimums; the first one (from left ro right) wins.
I would like to know if !(A xor B) is equal to (!A xor !B)?
I am struggling to understand the logic behind this problem.
They are not equal. You could check the following table for further explanation.
+---+---+-------+--------+----+----+-------+
| A | B | (A^B) | !(A^B) | !A | !B | !A^!B |
+---+---+-------+--------+----+----+-------+
| 0 | 0 | 0 | 1 | 1 | 1 | 0 |
| 0 | 1 | 1 | 0 | 1 | 0 | 1 |
| 1 | 0 | 1 | 0 | 0 | 1 | 1 |
| 1 | 1 | 0 | 1 | 0 | 0 | 0 |
+---+---+-------+--------+----+----+-------+
Edit: Computing !(A^B) without using NOT operation with A, B, A' and B'
XOR(A, B) = OR(AND(A, B'), AND(A', B))
After using DeMorgan for the equation above:
NOT XOR(A,B) = AND(OR(A', B), OR(A, B'))
If in doubt, use truth tables. A and B can be 1 or 0 so:
A xor B:
0 1
1 0
! (A xor B)
1 0
0 1
! A xor ! B:
0 1
1 0
So, the answer is no. They seem to be the same as the initial xor.
Going step by step, and looking at the resulting column, we see that they do not result in the same output based on the same input.
A
B
A XOR B
not(A XOR B)
0
0
0
1
0
1
1
0
1
0
1
0
1
1
0
1
A
B
!A
!B
(!A XOR !B)
0
0
1
1
0
0
1
1
0
1
1
0
0
1
1
1
1
0
0
0
No they're not.
A xor B is equal to 1 if and only if either A or either B is 1 but not both. Therefore !(A xor B) is equal to 1 if and only if both A and B are equal.
Whereas with (!A xor !B) you first flip the bits and then do the XOR. So (!A xor !B) = (A xor B).
Here is the truth table for the first one:
A | B | A xor B | !(A xor B)
============================
0 | 0 | 0 | 1
0 | 1 | 1 | 0
1 | 0 | 1 | 0
1 | 1 | 0 | 1
and for the second one:
A | B | !A | !B | (!A xor !B)
=============================
0 | 0 | 1 | 1 | 0
0 | 1 | 1 | 0 | 1
1 | 0 | 0 | 1 | 1
1 | 1 | 0 | 0 | 0
Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 years ago.
Improve this question
I'm having this:
char board_game [3][3] = {0}; // The Board Game
And after this I'm doing that:
scanf("%d%d", &row, &col); // Get The Input And Put It In Row And Column
printf("%d",board_game[row][col]);
There is some output that I don't understand and what does that line mean board_game[row][col]?
| input | output
| row | col |
|______|______|________
| 0 | 0 | 0
|------|------|--------
| 0 | 1 | 0
|------|------|--------
| 0 | 2 | 0
|------|------|--------
| 0 | 3 | 0
|------|------|--------
| 1 | 0 | 0
|------|------|--------
| 1 | 1 | 0
|------|------|--------
| 1 | 2 | 0
|------|------|--------
| 1 | 3 | 0
|------|------|--------
| 2 | 0 | 0
|------|------|--------
| 2 | 1 | 0
|------|------|--------
| 2 | 2 | 0
|------|------|--------
| 2 | 3 | 1 **WHY 1?**
|------|------|--------
| 3 | 0 | 1 **WHY 1?**
|------|------|--------
| 3 | 1 | 0
|------|------|--------
| 3 | 2 | 0
|------|------|--------
| 3 | 3 | 0
|------|------|--------
Can you please explain to me what is going on?
It's Out Of Bound and undefined (offsets: 3, 3| 3, 0) when the indexes only can be between 0..2
And you declared 0 at the start to all the array values
board_game[row][col] Means the value of the array in that offsets
You are trying to access a memory location that doesnot belong to your array. When you define an array of size 3 the index are 0,1,2 . The values board game[0][3],board game[1][3],board game[2][3],board game[3][0],board game[3][1], board game[3][2],board game[3][3] are out of bound
I've been trying to learn about minimum span trees and the algorithms associated with it, namely Prim's, Kruskal's and Dijkstra's algorithms.
I understand how these algorithms work and have seen them in action but there is only one thing I don't understand about Prim's algorithm, which is an array that I don't understand what it is it's intention and how does it work.
So here is the situation:
I have to do an exercise wherein I am given an adjacency table and I have to run Prim's algorithm to create a minimum span tree.
The table looks like this:
0 |1|2| 3| 4| 5|
0| 0 73 4 64 40 74
1| 73 0 46 26 30 70
2| 4 46 0 77 86 14
3| 64 26 77 0 20 85
4| 40 30 86 20 0 22
5| 74 70 14 85 22 0
The numbers separated by the "|" are the vertices and the numbers in the table are the edges. Simple, I run the algorithm ( in this website for example: http://www.jakebakermaths.org.uk/maths/primsalgorithmsolverv10.html ) or just jot it down on paper and and draw the minimum span tree and I get the tree with the minimal cost of 86 and the edges that have been used are 4, 26, 20, 22 and 14.
Now here comes the problem, apparently just solving it wasn't enough. I need to find the values of an array called closest[0,...,5]. I know it is used in the algorithm but I don't know it's purpose and what I should do with it or how to get it's values.
I have searched the internet for it and found this link about Prim's algorithm:
http://lcm.csa.iisc.ernet.in/dsa/node183.html
Which defines the array "closest" as "For i in V - U, closest[i] gives the vertex in U that is closest to i".
I still don't understand what it is, what it is used for and what the values inside of them are.
All I know the answer to my exercise is
closest[1] = 3
closest[2] = 0
closest[3] = 4
closest[4] = 5
closest[5] = 2
Thank you in advance.
When doing a MST with Prim's algorithm, it is important to keep track of four things: the vertex, has it been visited, minimal distance to vertex, and what precedes this vertex (this is what you are looking for).
You start at vertex 0, and you see that the closest vertex to 0 is 2. At the same time, you could have visited all other nodes, but with bigger distances. Nevertheless, the closest node to 0 is 2, so thus 2 becomes visited and its parent is set to vertex 0. All the other nodes are not visited yet, but its parent as of now is set to 0, with its respective distance. You now need to set the smallest distance vertex to visited, and now consider this node as the node to be considered.
Vertex | Visited | Distance | Parent
0 | T | - | -
1 | F | 73 | 0
2 | T | 4 | 0
3 | F | 64 | 0
4 | F | 40 | 0
5 | F | 74 | 0
We then check all the distances of nodes from 2. We compare the new distances from 2 to the other nodes to the distance from the other nodes from its previous distance, and if it needs to be updated, it gets updated. We now see that the distance from 2 to 5 is shorter than 0 to 5, and vertex 5 now becomes becomes visited, with its parent now equal to vertex 2.
Vertex | Visited | Distance | Parent
0 | T | - | -
1 | F | 46 | 2
2 | T | 4 | 0
3 | F | 64 | 0
4 | F | 40 | 0
5 | T | 14 | 2
Now we visit 5. One thing to note is that if a node is visited, we do not consider it in our distance calculations. I have simulated the rest, and hopefully you can see how you get the answer you're looking for.
Vertex | Visited | Distance | Parent
0 | T | - | -
1 | F | 46 | 2
2 | T | 4 | 0
3 | F | 64 | 0
4 | T | 22 | 5
5 | T | 14 | 2
Now visit 4
Vertex | Visited | Distance | Parent
0 | T | - | -
1 | F | 46 | 2
2 | T | 4 | 0
3 | T | 20 | 4
4 | T | 22 | 5
5 | T | 14 | 2
And now visit 3
Vertex | Visited | Distance | Parent
0 | T | - | -
1 | T | 26 | 3
2 | T | 4 | 0
3 | T | 20 | 4
4 | T | 22 | 5
5 | T | 14 | 2
How can i visualize 2D array with surface(mesh, surf) for incomplete dataset?
'Incomplete' means (v - known values, 0 - unknown):
1 | 2 | 3 | 4 | 5
1 | v | 0 | v | 0 | v
2 | 0 | 0 | 0 | 0 | 0
3 | v | 0 | v | 0 | v
4 | v | 0 | v | 0 | v
5 | 0 | 0 | 0 | 0 | 0
Such data indexing is handy for analyzing non-linear relation between variables.
The thing i want is working somehow with plot function. Lets say, x = [1,2,4,5]. plot will show continuous figure.
Is it possible to do so for 2D arrays without manual interpolation? Don't care about smoothness. Linear connection of known points is alright.
So you have non-linear sampling (x = [1 3 5], y = [1 3 4]), and you don't want to interpolate? I don't think surf etc will handle it. Sounds like a job for plot3.
This is mildly ugly (see result) but I'm presuming you just want to visualise it to get a feel for the data. First make up your x and y with repmat if you don't already have them like this:
x =
1 3 5
1 3 5
1 3 5
y =
1 1 1
3 3 3
4 4 4
Then you'll need your values without all the zeros in to match:
z =
6 8 10
6 5 4
4 2 1
This can be plotted with markers (might be the simplest if you have lots of points). Or you can use this trick to make a "mesh" out of two sets of lines:
plot3(x,y,z)
hold on
plot3(x',y',z')
xlabel('x');
ylabel('y');
Exactly as with your plot example, this simply linearly connects between the existing points.
You could replace the 0 values for NaN values.
Both, surf and mesh work with NaN.