A classifier trained using the technique of word embedding(doc2vec) and logisitic regression misclassify the data - logistic-regression

I have a text classification problem in which the data set consist of 16 million records. The data is highly imbalanced and consist of almost 500 classes. I took the approach for word embedding and then used Logistic Regression to build a model in which the input is doc2vec matrix, the accuracy that i achieved was 88% but and recall score and F1-score is closet to 84% but when working on testing data the classifier doesn't perform well for example,
if a text details are like i love traveling it is classified under Travel Tag, but if it encounters the text again the model classify it into some different category like unknown. This an unusual behavior that i have encountered in the model.
Code:
Y = " Love travelling"
tokens = cleanText(Y)
vector = model.infer_vector(tokens.split())
predict = logreg.predict(vector.reshape(1,-1))
print(predict)
Output Expected:
Travel
Actual Output:
Unmapped
Regressor code:
class_weight = class_weight.compute_class_weight('balanced', np.unique(y_train), y_train)
logreg = LogisticRegression(multi_class='multinomial', solver='lbfgs', class_weight="balanced")
logreg.fit(train_vectors_dbow, y_train)
Expected result:
Text Label
1. I like to travel Travel
2. I love to travel Travel
Actual
Text Label
1. I like to travel Travel
2. I love to travel Unmapped

Related

How can I predict a user generated distribution by learning from previous distributions

I am trying to program a prediction algorithmus which predicts the distribution of marbles among 4 cups based on the prevision user input. But I have no idea where to start or which techniques can be used to solve this.
Example:
There are 4 cups numbered from 0 to 3 and the user will receive x marbles which he distributes among those cups. Each round the user receives another amount of marbles (or the same amount) and before
the user distributes them, the algorithm tries to predict the distribution based on the users previous inputs. After that the user “corrects” it. The goal is that the user does not have to correct anything hence the
algorithm predicts the correct distribution. However the pattern which the user distributes the marbles can change, and the algorithm has to adapt.
This is the simplest design of the problem which already is not trivial to solve. However it get exponential more complex when the marbles have additional properties which can be used for distribution.
For example, they could have a color and a weight.
So, for example, how does the algorithm learn that the user (most of the time) put the marbles with the same color in one cup, but cup 2 is (most of the time empty) and the rest is equally distributed?
So in my head the algorithm has to do something like this:
search for a pattern after the users distribution is done. Those patterns can be the amount of marbles per cup / weight per cup or anything else.
if a pattern is found a predefined value (weight) is added to the pattern
if a previous pattern was not found a predefined value has to be subtracted from the pattern
when the algorithm has to predict, all patterns with a predefined weight have to be applied.
I am not sure if I am missing something and how I would implement something like this or in which area I have to look for answers.
First of all, bear in mind that human behavior does not always follow a pattern. If the user distributes these marbles randomly, it will be hard to predict the next move!
But if there IS a pattern in the distributions, you might create a prediction using an algorithm such as a neural network or a decision tree.
For example:
// dataset1
// the weights and colors of 10 marbles
let dataset1 = [4,3,1,1,2,3,4,5,3,4,7,6,4,4,2,4,1,6,1,2]
// cups1
// the cups distribution of the above marbles
let labels1 = [2,1,0,2,1,1,3,1,0,1]
Now you can train an algorithm, for example a neural network or a decision tree.
This isn't real code, just an example of how it could work.
let net = new NeuralNet()
net.train(dataset1, labels1)
After training with lots of data (at least hundreds of these datasets), you can give the network a new dataset and it will give you a prediction of the cups distribution
let newMarbleSet = [...]
let prediction = net.predict(newMarbleSet)
It's up to you what you want to do with this prediction.

Natural language classifier returns classifications for untrained items

I am confused as to how NLC works. My expectation is that when it is asked to classify text that it should have no relation or training data to learn from it should return no results or results with very low confidence scores.
I have trained a model with a set of training data and when I attempt to classify text that is outside of the training data I am getting results with high confidence values (~60%).
Here's an example of my training data:
foo,1,2,3,4
bar,1,2,3,4
baz,1,2,3,4
When I try to classify the text "This should not exist" I receive a high confidence that this text is "1".
Is my assumption correct in that I should be returned values in this case? Am I training the data to classify foo, bar, and baz incorrectly? If not what should I expect from the NLC service?
Imagine that you have 3 buckets and you have to throw a coin in one of them. Each bucket has 33.3 % changes to get the coin. The same happens with Natural Language Classifier service. It is trained to classify input text into predefined classes.
If you create a classifier with 3 classes and you try to classify text that wasn't in the training data, NLC will still classify your sentence to one of the three classes you defined. If your output is 60% then the other two buckets will get the remaining 40%.
Sometimes you could get a high score and that's normal when you have classes that are very different.

House pricing using neural network

I wrote multilayer perceptron implementation (on Python) which is able to classify Iris dataset. It was trained by backpropagation algorithm and uses sigmoid actiovation functions on a hidden and output layers.
But now I want to change it to be able to approximate house price.
(I have dataset of ~300 estates with prices and input parameters like rooms, location etc.)
Now output of my perceptron is in range [0;1]. But as far as I understand if I want to get resulting house price on the output neuron I need to change that activation function somehow right?
Can somebody help me?
I'm new to neural networks
Thanks in advance.
Assuming, for instance, that houses price between $1 and $1,000,000, then you can just map the 0...1 range to the final price range both for the training and for the testing. Just note that 300 estates is a fairly small data set.
To be precise, if a house is $500k, then the target training output becomes 0.5. You can basically divide by your maximum possible home value to get the target training amount. When you get the output value you multiple by the maximum home value to get the predicted price.
So, view the output of the neural network as the percentage of the total cost.

Backpropagation overall error chart with very small slope... Is this normal?

I'm training a neural network with backpropagation algorithm and this is the chart of Overall Errors:
( I'm calculating Overall error by this formula : http://www.colinfahey.com/neural_network_with_back_propagation_learning/neural_network_with_back_propagation_learning_en.html Part 6.3 : Overall training error)
I have used Power Trendline and after calculations, I saw that if epoches = 13000 => overall error = 0.2
Isn't this too high?
Is this chart normal? Seems that the training process will take too long... Right? What should I do? Isn't there any faster way?
EDIT : My neural network has a hidden layer with 200 neurons. and my input and output layers have 10-12 neurons. My problem is clustering characters. (it clusters Persian characters into some clusters with supervised training)
So you are using a ANN with 200 input nodes with 10-12 hidden nodes in the hidden layer, what activation function are you using if any for your hidden layer and output layer?
Is this a standard back propagation training algorithm and what training function are you using?
Each type of training function will affect the speed of training and in some cases its ability to generalise, you don't want to train against your data such that your neural network is only good for your training data.
So ideally you want decent training data that could be a sub sample of your real data say 15%.
You could training your data using conjugate gradient based algorithm:
http://www.mathworks.co.uk/help/toolbox/nnet/ug/bss331l-1.html#bss331l-2
this will train your network quickly.
10-12 nodes may not be ideal for your data, you can try changing the number in blocks of 5 or add another layer, in general more layers will improve the ability of your network to classify your problem but will increase the computational complexity and hence slow down the training.
Presumably these 10-12 nodes are 'features' you are trying to classify?
If so, you may wish to normalise them, so rescale each to between 0 and 1 or -1 to 1 depending on your activation function (e.g. tan sigmoidal will produce values in range -1 to +1):
http://www.heatonresearch.com/node/706
You may also train a neural network to identify the ideal number of nodes you should have in your hidden layer.

Neural network weights adjustment by the user ratings

im really new to NN, and im trying to implement it in my recommendation system that gives users recommendations on user similarities.
The thing is that im having 4 different similarities of users by different parameters, and im using weights to make the importance of each similarity in total similarity.
region similarity = 0.5, weightRegion=0.6
interests similarity = 0.3, weightInterest=0.8
education similarity = 0.75, weightEducation=1.1
positions similarity = 0.6, weightPositions=1.5
so calculating total similarity will be multiplied sum divided by sum of the weights: (0.5*0.6+0.3*0.8+0.75*1.1+0.6*1.5)/4
//im dividing by sum of weights to put parameter in {0..1}
So the thing is i need to control those weights by the user rating (user clicks rating from 1 to 10 and weights r corrected)
I've built such NN:
So what im doing is:
n=0.25 (learning k);
rating=0.7 (that is my 7 rating);
net5=x1*w15+x2*w25+x3*w35+x4*w45;
out5=1/(1-pow(e,-net5));
real=out5*(1+1-rating);
err=out5*(1-out5)*(real-out5);
w15n=w15+errnx1;
w25n=w25+errnx2;
w35n=w35+errnx3;
w45n=w45+errnx4;
(im sry for code formatting, it kept saying its not properly formatted)
What am I doing wrong? cause results of such correcting arent good at all.
Thanks
I think you are going the wrong way. Backpropagation isn't a good choice for this type of learning (somehow incremental learning).
To use backpropagation you need some data, say 1000 data where different types of similaritiy (input) and the True Similarity (output) are given. Then weights will update and update until error rate comes down. And besides you need test data set too that will make you sure the result network will do good even for similarity values it didn't see during training.

Resources