how to solve this with kmeans clustering and use cosine similiraty - database

can anyone tell to me how k-means clustering working on textmining..
and i using cosine similarity as the distance metric.
nim 310910022 320910044 310910043 310910021
access 0 2 3 5
abdi 1 0 0 0
actual 5 0 0 1
arrow 0 1 1 2
this data is on listview
How can I do that in VB.net? to get any cluster and trending topic of the term ?
Thank in advance

First I would separate the problem into two parts:
Computing the k-means clustering assignments
Getting the data from the GUI (you mention the data is on a listview)
I assume 2 is straight forward and you just need help on 1.
I would start by re-writing the code to just read a TSV text file of the data as you specified. This will make things a lot easier to debug.
Then ask if you want to implement the kmeans algorithm yourself or use a library.
If you want to implement it, here is a link to the pseudo code
http://www.scribd.com/doc/89373376/K-Means-Pseudocode
You can also search up other kmeans pseudocode.
If you want to use a library to just "run" your data against a kmeans algorithm, here is one example in python/scipy.
http://glowingpython.blogspot.com/2012/04/k-means-clustering-with-scipy.html
Regardless of which approach you use, realize that kmeans is non-deterministic and you may get different answers everytime you run it. I would recommend computing against a known validation set to see if the data is roughly what you think it should be.

Related

Predict probabilities of a recommender system?

I have a dataset where I have a sparse utility matrix (user-product) with binary input: 1 if the user 𝑖 bought the product 𝑗, and 0 if it hasn't.
However it has a different meaning on the test set, 0 means that we don't know if the user bought this product, and 1 means that we're sure that the user bought that given product.
I need to get for each user in each product the probability that a user 𝑖 bought a product 𝑗 in the test set. For this I wish to use different matrix factorization techniques like FunkSVD, NMF or SVD++ , but I'm quite confused:
These techniques would only allow me to get the label (1 or 0) on test set, but I need to compute a probability of getting 1 and not the label in it self.
How can I approach this problem ? Or do I treat it as a classification problem and then use all common classification techniques ?
Okay, One approach that might help based on Recurrent-Knowledge-Graph-Embedding.
Try converting user-item interactions into a knowledge graph and mine the top-n paths in the graph between user_i and item_j. Build an RNN model as described in the paper and its code, to get probabilities.

How is the range of the last layer of a Neural Network determined when using ReLU

I'm relatively new to Neural Networks.
Atm I am trying to program a Neural Network for simple image recognition of numbers between 0 and 10.
The activation function I'm aiming for is ReLU (rectified linear unit).
With the sigmoid-function it is pretty clear how you can determine a probability for a certain case in the end (because its between 0 and 1).
But as far as I understand it, with the ReLU we don't have these limitations, but can get any value as a sum of previous "neurons" in the end.
So how is this commonly solved?
Do I just take the biggest of all values and say thats probability 100%?
Do I sum up all values and say thats the 100%?
Or is there another aproach I can't see atm?
I hope my question is understandable.
Thanks in advance for taking the time, looking at my question.
You can't use ReLU function as the output function for classification tasks because, as you mentioned, its range can't represent probability 0 to 1. That's why it is used only for regression tasks and hidden layers.
For binary classification, you have to use output function with range between 0 to 1 such as sigmoid. In your case, you would need a multidimensional extension such as softmax function.

Viola - jones adaboost method

I have read the paper about Viola-Jones method for object detection and am confused by a few things.
1 - For Adaboost does each round mean that we calculate all the 160k features across all images then find the one with the least error (which as i understand is a 'weak classifier? please correct me if I am wrong). If yes then won't this take extremely long to train on a large set of images which can take months ? And also how many rounds will you run it for if this is correct.
2- If the above point is wrong then does it mean that for each feature, we evaluate all the non-face and face images with one feature and compare it to a certain acceptable threshold of error, and if it lies below this acceptable threshold then we take this feature as a weak classifier and then update the weights before using the next feature from the 160k features.
I have tried understanding the MATLAB code in this link http://www.ece301.com/ml-doc/54-face-detect-matlab-1.html but not sure if the way he implemented adaboost was correct.
It would also be a great help if there was a link that would explain adaboost used by Viola-Jones in a simple and clear way.
AdaBoost tries out multiple weak classifier over several rounds , Selecting the best weak classifier in each round and combining the best classifier to create a strong classifier.
Example for adaboost :
Data point Classifier 1 Classifier 2 Classifier 3
x1 Fail pass fail
x2 Pass fail pass
x3 fail pass pass
x4 pass fail pass
AdaBoost can use classifier that are consistently wrong by reversing their decision .
1) Yes. If your parameters are, say, 5000 positive windows and 5000 negative ones (extracted from the set of negative background images you provided initially), then at each stage every feature is evaluated in turn for all 10000 windows and the best one is added to the feature set.
And yes in the original paper of Viola-Jones each feature is a weak classifier.
The process of checking all features will take a long time, but not weeks. As a matter of fact the bottle neck is the gathering of negative widows over the last stages than could require days or also weeks The number of rounds depends on the reaching of the given conditions. Each stage requires a number of round (usually more in final stages). So for example: stage 1 requires 3 features (3 rounds), stage 2 requires 5 features (5 rounds), stage 3 requires 6 features (6 rounds), etc.
2) The above point is true so it doesn’t apply.

Taguchi Method Programming Example

I've been asked to research some programming related to the "Taguchi Method", especially as it relates to Multi-variant testing. This is one of the first subjects I've tried to research that I've found zero, nada, zilch, code examples for, especially considering its mathematical basis.
I've found some books describing the math involved but it looks like I'm going to be doing some math brush up unless I can find some code examples I can relate to.
Is this one of those rare things that once you work out the programming, it's so valuable that no one shares? Or do I just fail at Taguchi + google?
Taguchi designs are the same thing as covering arrays. The basic idea is that if you have F data "fields" and every one can have N different values, it is possible to construct NF different test cases. A covering array is basically a set of test cases that together cover all possible pairwise combinations of two field values, and the idea is to generate as small one as possible. E.g. if F=3 and N=3, you have 27 possible test cases, but it is enough to have nine test cases if you aim for pairwise coverage:
Field A | Field B | Field C
---------------------------
1 1 1
1 2 2
1 3 3
2 1 2
2 2 3
2 3 1
3 1 3
3 2 1
3 3 2
In this table, you can choose any two fields and any two values and you can always find a row that contains the chosen values for the chosen fields.
Generating Taguchi designs in general is a difficult combinatorial problem.
You can generate Taguchi designs by various methods:
Branch and bound
Stochastic search (e.g. tabu search or simulated annealing)
Greedy search
Specific mathematical constructions for some specific structures

Determining which inputs to weigh in an evolutionary algorithm

I once wrote a Tetris AI that played Tetris quite well. The algorithm I used (described in this paper) is a two-step process.
In the first step, the programmer decides to track inputs that are "interesting" to the problem. In Tetris we might be interested in tracking how many gaps there are in a row because minimizing gaps could help place future pieces more easily. Another might be the average column height because it may be a bad idea to take risks if you're about to lose.
The second step is determining weights associated with each input. This is the part where I used a genetic algorithm. Any learning algorithm will do here, as long as the weights are adjusted over time based on the results. The idea is to let the computer decide how the input relates to the solution.
Using these inputs and their weights we can determine the value of taking any action. For example, if putting the straight line shape all the way in the right column will eliminate the gaps of 4 different rows, then this action could get a very high score if its weight is high. Likewise, laying it flat on top might actually cause gaps and so that action gets a low score.
I've always wondered if there's a way to apply a learning algorithm to the first step, where we find "interesting" potential inputs. It seems possible to write an algorithm where the computer first learns what inputs might be useful, then applies learning to weigh those inputs. Has anything been done like this before? Is it already being used in any AI applications?
In neural networks, you can select 'interesting' potential inputs by finding the ones that have the strongest correlation, positive or negative, with the classifications you're training for. I imagine you can do similarly in other contexts.
I think I might approach the problem you're describing by feeding more primitive data to a learning algorithm. For instance, a tetris game state may be described by the list of occupied cells. A string of bits describing this information would be a suitable input to that stage of the learning algorithm. actually training on that is still challenging; how do you know whether those are useful results. I suppose you could roll the whole algorithm into a single blob, where the algorithm is fed with the successive states of play and the output would just be the block placements, with higher scoring algorithms selected for future generations.
Another choice might be to use a large corpus of plays from other sources; such as recorded plays from human players or a hand-crafted ai, and select the algorithms who's outputs bear a strong correlation to some interesting fact or another from the future play, such as the score earned over the next 10 moves.
Yes, there is a way.
If you choose M selected features there are 2^M subsets, so there is a lot to look at.
I would to the following:
For each subset S
run your code to optimize the weights W
save S and the corresponding W
Then for each pair S-W, you can run G games for each pair and save the score L for each one. Now you have a table like this:
feature1 feature2 feature3 featureM subset_code game_number scoreL
1 0 1 1 S1 1 10500
1 0 1 1 S1 2 6230
...
0 1 1 0 S2 G + 1 30120
0 1 1 0 S2 G + 2 25900
Now you can run some component selection algorithm (PCA for example) and decide which features are worth to explain scoreL.
A tip: When running the code to optimize W, seed the random number generator, so that each different 'evolving brain' is tested against the same piece sequence.
I hope it helps in something!

Resources