Clustering : elbow method vs silhouette score - silhouette

I tried to find optimal number of cluster using elbow method and silhouette score but the results i got are contradicting with each other. Elbow plot suggests the number of clusters to be around 7-8 but the silhouette score is maximum for 3. I don't know the actual number of clusters and I am using k means. I wanted to know if it's normal ? And how to proceed further?

Related

Solr Distance Filter from a Radius Field

Hi I am very new to Solr queries (like a few hours), so please excuse me if this is a naive question, but is there a way on the geo filter to set the radius from a field.
{!geofilt pt=35.3459327,-97.4705935 sfield=locs_field_location$latlon d=fs_radius}
Or do a subquery to return the value of that field fs_field_job_search_radius and place it in there. I can return the value from the field list so I was hoping it could go in there, in some method.
This is similar to this Filtering by distance vs. field value in Solr but I do not know if he got it working or where I would need to start to write a function as was suggested. Also this is on a Solr server I do not control. It is controlled by my hosting company, so I do not know if I can even create functions. Thanks.
Took a work around, but I got what I was trying to accomplish I believe.
fq={!frange l=0 h=12742}sub(radius_field,geodist(field,point))
The 12742 is the diameter of the earth in km as I still needed a hard number for that, but I doubt most are searching in space. So basically we subtract the distance from radius_field to find out if it is in range.
radius_field - distance
If the results are a positive number than it is within range. If it is a negative number than it is not. Please let me know if I screwed up my logic. Thanks.

How can I tell if a particular heuristic is admissible, and why mine is not?

The definition of an admissible heuristic is one that "does not overestimate the path of a particular goal".
I am attempting to write a Pac-Man heuristic for finding the fastest method to eat dots, some of which are randomly scattered across the grid. However it is failing my admissibility test.
Here are the steps of my algorithm:
sum = 0, list = grid.getListofDots()
1. Find nearest dot from starting position (or previous dot that was removed) using manhattan distance
2. add to sum
3. Remove dot from list of possible dots
4. repeat steps 1 - 3 until list is empty
5. return the sum
Since I'm using manhattan distance, shouldn't this be admissible? If not, are there any suggestions or other approaches to make this algorithm admissible?
As said your heuristics isn't admissible. Another example is:
Your cost is 9 but the best path has cost 6.
A very, very simple admissible heuristics is:
number_of_remaining_dots
but it isn't very tight. A small improvement is:
manhattan_distance_to_nearest_dot + dots_left_out
Other possibilities are:
distance_to_nearest_dot // Found via Breadth-first search
or
manhattan_distance_to_farthest_dot

How to determine the parameters of DBSCAN?

I tried the kdist plot as introduced in one paper I read and used the knee distance to set the epsilon. However,the results were not satisfying.
I use the WEKA to implement DBSCAN, but it always returns only one cluster.
Can anyone please give me some advice?
This can happen if the k-dist plot has more than 1 knee (this can happen when the dataset contains clusters having different density, and the outcome you have obtained arise when the high density clusters are nested inside the low density ones).
The solution is to search the next knee and re-apply the algorithm on the core points you already have found.

Monte Carlo Tree Search for card games like Belot and Bridge, and so on

I've been trying to apply MCTS in card games. Basically, I need a formula or modify the UCB formula, so it is best when selecting which node to proceed.
The problem is, the card games are no win/loss games, they have score distribution in each node, like 158:102 for example. We have 2 teams, so basically it is 2-player game. The games I'm testing are constant sum games (number of tricks, or some score from the taken tricks and so on).
Let's say the maximum sum of teamA and teamB score is 260 at each leaf. Then I search the best move from the root, and the first I try gives me average 250 after 10 tries. I have 3 more possible moves, that had never been tested. Because 250 is too close to the maximum score, the regret factor to test another move is very high, but, what should be mathematically proven to be the optimal formula that gives you which move to chose when you have:
Xm - average score for move m
Nm - number of tries for move m
MAX - maximum score that can be made
MIN - minimum score that can be made
Obviously the more you try the same move, the more you want to try the other moves, but the more close you are to the maximum score, the less you want to try others. What is the best math way to choose a move based ot these factors Xm, Nm, MAX, MIN?
Your problem obviously is an exploration problem, and the problem is that with Upper Confidence Bound (UCB), the exploration cannot be tuned directly. This can be solved by adding an exploration constant.
The Upper Confidence Bound (UCB) is calculated as follows:
with V being the value function (expected score) which you are trying to optimize, s the state you are in (the cards in the hands), and a the action (putting a card for example). And n(s) is the number of times a state s has been used in the Monte Carlo simulations, and n(s,a) the same for the combination of s and action a.
The left part (V(s,a)) is used to exploit knowledge of the previously obtained scores, and the right part is the adds a value to increase exploration. However there is not way to increase/decrease this exploration value, and this is done in the Upper Confidence Bounds for Trees (UCT):
Here Cp > 0 is the exploration constant, which can be used to tune the exploration. It was shown that:
holds the Hoeffding's inequality if the rewards (scores) are between 0 and 1 (in [0,1]).
Silver & Veness propose: Cp = Rhi - Rlo, with Rhi being the highest value returned using Cp=0, and Rlo the lowest value during the roll outs (i.e. when you randomly choose actions when no value function is calculated yet).
Reference:
Cameron Browne, Edward J. Powley, Daniel Whitehouse, Simon M. Lucas, Peter I. Cowling, Philipp Rohlfshagen, Stephen Tavener, Diego Perez, Spyridon Samothrakis and Simon Colton.
A Survey of Monte Carlo Tree Search Methods.
IEEE Trans. Comp. Intell. AI Games, 4(1):1–43, 2012.
Silver, D., & Veness, J. (2010). Monte-Carlo planning in large POMDPs. Advances in Neural Information Processing Systems, 1–9.

google app engine : query geohash

I have a db.StringProperty() of geohash, by given a hashcode, how do I find the closer 10 result?
I tried below but doesn't seem to be right
pois = POI.all().filter('geohash <', h_latlng).order('-geohash').fetch(10)
A geohash cannot accomplish the task to find the n-nearest results. You can find the contents of any square region by prefix. But to find a reliable result containing the n-nearest you need to fetch at least 9 prefixes, making it a quite expensive query. Complicating the matter is that prefixes of the 9 squares need to be calculated.
IMO this problem is currently a hard problem to solve efficiently on app-engine. So far, I am on it since a year and have not found a sophisticated and fast solution. A Relational DB with geo index or 2 inequalities will perform such tasks better and faster. But I am interested in good solutions, too. :-)
Citation David Troy:
Geohash also has the property that as
the number of digits decreases (from
the right), accuracy degrades. This
property can be used to do bounding
box searches, as points near to one
another will share similar Geohash
prefixes.
However, because a given point may
appear at the edge of a given Geohash
bounding box, it is necessary to
generate a list of Geohash values in
order to perform a true proximity
search around a point. Because the
Geohash algorithm uses a base-32
numbering system, it is possible to
derive the Geohash values surrounding
any other given Geohash value using a
simple lookup table.
See: https://github.com/davetroy/geohash-js

Resources