Batch size for Feed-Forward Neural Network - artificial-intelligence

I have a 100k dataset for ML, how much should I set up the batch size for training, ?
FYI - I am using the train_test_split library to splitting the data into train and test sets.
Thank you!
## Fit network
history = model.fit(X_train, Y_train, epochs = 1500, validation_split=0.2, verbose=1, shuffle=False, batch_size=60)

Wannees, You need to give more details so that others can help you. Such as type of data (numeric or image), application, number of features/ targets, network architecture, etc.
You may learn something from the existing question in this web: How to calculate optimal batch size, https://stats.stackexchange.com/questions/153531/what-is-batch-size-in-neural-network

Related

Inconsistent # of Samples NLP IMDB Sentiment Classification

I'm trying to make a sentiment classifier on the IMDB dataset. I am fairly new to NLP and data science, and I keep getting this error while tryng to fit my model.
ValueError: Found input variables with inconsistent numbers of samples: [24745, 40000]
I've looked at many other threads and it all says to reshape your data, which I did, the size of my X variable is (24745, 100) and for my y variables its (40000, 1).
I am currently trying to use any of these models:
MultinomialNB
BernoulliNB
GaussianNB
I've also tried to create a Tensorflow Sequential Model with Bidirectional LSTM but that produced terrible accuracy, it was at 50 for a long time, randomly guessing.
Any help appreciated please.

Import CSV file into python, then turn it into numpy array, then feed it to sklearn algorithm

Sklearn algorithm require a feature and a label for it to learn.
I have a CSV file which contain some data. These data is actually a challenge from hackerearth website in which participant need to create a learning algorithm that learn from data on massive amount of individuals from affiliate network and their ad click performance which then predict future performance of other individuals in the affiliate network which allow the company to optimize their ad performance.
The features in these data include id,date,siteid, offerid, category, merchant, countrycode,type of browser, type of device and the number of clicks their ads have gotten.
https://www.hackerearth.com/practice/algorithms/string-algorithm/string-searching/practice-problems/machine-learning/predict-ad-clicks/
So my plan is to use the first 7 information as my feature and ad click as label. Unfortunately, countrycode,browser and device information is in text (Google Chrome, Desktop) and not integers which can be turned into array.
Q1: Is there a way for sklearn to accept not just numpy arrays but also words as features? Am I support to use vectorizer for this? If so, how would I do it? If not, can I just replace the wording data into numbers (Google Chrome replaced by 1, firefox replaced by 2) and still have it to work? (I am using Naive Bayes algorithm)
Q2: Would Naive Bayes algorithm be suitable for this task? Since this competition require participant to create a program that predict the probability of individuals in affiliate network have their ads click, I assume Naive Bayes would be best suited.
Training data : https://drive.google.com/open?id=1vWdzm0uadoro3WcpWmJ0SVEebeaSsHvr
Testing data : https://drive.google.com/open?id=1M8gR1ZSpNEyVi5W19y0d_qR6EGUeGBQl
My messy coding and horrible attempt at this challenge which I don't think will be much help:
from sklearn.naive_bayes import GaussianNB
import csv
import pandas as pd
import numpy as np
data = []
from numpy import genfromtxt
import pandas as pd
data = genfromtxt('smaller.csv', delimiter=',')
dat = pd.read_csv('smaller.csv', delimiter=',')
print(dat(siteid))
feature = []
label =[]
i = 1
j = 1
while i <17:
feature.append(data[i][2:8])
i += 1
while j <17:
label.append(data[i][9])
j += 1
clf = GaussianNB()
clf.fit(feature,label)
print(clf.predict([data[18][2:8]]))
print(data[18])
Answer for Question1: No. Sklearn only works with numerical data. So you need to convert your text to numbers.
Now to convert text to numbers you can follow multiple approaches. First is as you said just assign numbers to them. But you need to to take in account if the text data shows any order like the numbers assigned to them or not. In that case, most often one-hot encoding is used. Please see the below scikit-learn documentation for that:
- http://scikit-learn.org/stable/modules/preprocessing.html#encoding-categorical-features
Answer to Question 2: It depends on the data and task at hand.
No single algorithm is capable of handling every type of data optimally.
Most of the times we need to compare multiple algorithms and see what gives best result for our data. See this example:
http://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html#sphx-glr-auto-examples-classification-plot-classifier-comparison-py
Even in a single algorithm we need to check for various parameter values, tune those values for maximum score. This is called grid-search. See this example:
http://scikit-learn.org/stable/auto_examples/model_selection/plot_grid_search_digits.html#sphx-glr-auto-examples-model-selection-plot-grid-search-digits-py
Hope this clears your doubts. Make sure to go through the scikit-learn documentation and examples:
http://scikit-learn.org/stable/user_guide.html
http://scikit-learn.org/stable/auto_examples/index.html
They are one of the best out there.

I want to use deep learning to classify features to scores

I have a problem and was wondering if I can use deep-learning to solve it.
I have a lists of 7 features, and for each list I have 7 scores.
For examples for the features:
[0.2,0.6,0.2,0.6,0.1,0.3,0.1]
I have the following scores:
[100,0,123,2,14,15,2]
and for the features:
[0.1,0.2,0.3,0.6,0.5,0.1,0.2]
I have the following scores:
[10,10,13,22,4,135,22]
etc..
Any ideas of how to utilize deep learning to train a network that giving a list of features will give me back the correct scores.
Thanks
You have the basic setup here for a regression problem. You could try solving this problem using a neural network toolkit. I wrote a toolkit called theanets that might help, so I'll give a simple example of how you might use it:
import numpy as np
import theanets
# set up data arrays: X is input, Y is target output
X = np.array([
[0.2,0.6,0.2,0.6,0.1,0.3,0.1],
[0.1,0.2,0.3,0.6,0.5,0.1,0.2],
], 'f')
Y = np.array([
[100,0,123,2,14,15,2],
[10,10,13,22,4,135,22],
], 'f')
# set up a regression model:
# map from X to Y using one hidden layer.
exp = theanets.Experiment(
theanets.Regressor,
(X.shape[1], 100, Y.shape[1]))
# train the model using rmsprop.
exp.train([X, Y], algorithm='rmsprop')
# predict outputs for some inputs.
Yhat = exp.network.predict(X)
There are several options for configuring and training your model, have a look at the documentation for more info.
There are also many, many other neural network toolkits out there, here are just a few popular ones that I'm familiar with:
Lasagne
Keras
Caffe
You might want to give these a try to see whether they fit better with your mental model of the problem you're trying to solve.
You generate a big number of neural networks
You give a fitness score to each neural net based on the results(the higher the fitness score the better)
You sort the neural nets by their fitness score
You take the first x%
You apply small mutations to each selected neural net.
Repeat 2-5 until results are satisfactory.
That big number mentioned in the first step should be roughly equal to:
(100/x)^generationCount
where the x here is the same one as in step 4 and generationCount is the amount of generations until final result.

How to determine the threshold for neuron firings in neural networks?

I have a simple task to classify people by their height and hair length to either MAN or WOMAN category using a neural network. Also teach it the pattern with some examples and then use it to classify on its own.
I have a basic understanding of neural networks but would really need some help here.
I know that each neuron divides the area to two subareas, basically that is why P = w0 + w1*x1 + w2*x2 + ... + wn*xn is being used here (weights are just moving the line if we consider geometric representation).
I do understand that each epoche should modify the weights to get closer to correct result, yet I have never program it and I am hopeless about how to start.
How should I proceed, meaning: How can I determine the threshold and how should I deal with the inputs?
It is not a homework rather than task for the ones who were interested. I am and I would like to understand it.
Looks like you are dealing with a simple Perceptron with a threshold activation function. Have a look at this question. Since you ARE using a bias neuron (w0), you would set the threshold to 0.
You then simply take the output of your network and compare it to 0, so you would e.g. output class 1 if x < 0 and class 2 if x > 0. You could model the case x=0 as "indistinct".
For learning the weights you need to apply the Delta Learning Rule which can be implemented very easily. But be careful: a perceptron with a simple threshold activation function can only be correct if your data are linearly separable. If you have more complex data you will need a Multilayer Perceptron and a nonlinear activation function like the Logistic Sigmoid Function.
Have a look at Geoffrey Hintons Coursera Course, Lecture 2 for details.
I've been working with machine learning lately (but I'm not an expert) but you should look at the Accord.NET framework. It contains all the common machine learning algorithme out of the box. So it's easy to take an existing samples and modify it instead of starting from scratch. Also, the developper of the framework is very helpful in the forum available on the same page.
With the available samples, you may also discover something better than neural network like the Kernel Support Vector Machine. If you stick to the neural network, have fun modifying all the different variables and by tryout and error you will understand how it work.
Have fun!
Since you said:
I know that each neuron divides the area to two subareas
&
weights are just moving the line if we consider geometric representation
I think you want to use perseptron or ADALINE neural networks. These neural networks can just classify linear separable patterns. since your input data is complicated, It's better to use a Multi layer Non-Linear Neural network. (my suggestion is a two layer neural network with tanh activation function) . For training these network you should use back propagation algorithm.
For answering to
how should I deal with the inputs?
I need to know more details about the inputs( Like: are they just height and hair length or there is more, what is their range and your resolution and etc.)
If you're dealing with just height and hair length I suggest that divide heights and length in some classes (for example 160cm-165cm, 165cm-170cm & etc.) and for each one of these classes set an On/Off input neuron. then put a hidden layer after all classes related to heights and another hidden layer after all classes related to hair length (tanh activation function). Number of neurons in these two hidden layer is determined based on number of training cases.
then take these two hidden layer output and send them to an aggregation layer with 1 output neuron.

Backpropagation overall error chart with very small slope... Is this normal?

I'm training a neural network with backpropagation algorithm and this is the chart of Overall Errors:
( I'm calculating Overall error by this formula : http://www.colinfahey.com/neural_network_with_back_propagation_learning/neural_network_with_back_propagation_learning_en.html Part 6.3 : Overall training error)
I have used Power Trendline and after calculations, I saw that if epoches = 13000 => overall error = 0.2
Isn't this too high?
Is this chart normal? Seems that the training process will take too long... Right? What should I do? Isn't there any faster way?
EDIT : My neural network has a hidden layer with 200 neurons. and my input and output layers have 10-12 neurons. My problem is clustering characters. (it clusters Persian characters into some clusters with supervised training)
So you are using a ANN with 200 input nodes with 10-12 hidden nodes in the hidden layer, what activation function are you using if any for your hidden layer and output layer?
Is this a standard back propagation training algorithm and what training function are you using?
Each type of training function will affect the speed of training and in some cases its ability to generalise, you don't want to train against your data such that your neural network is only good for your training data.
So ideally you want decent training data that could be a sub sample of your real data say 15%.
You could training your data using conjugate gradient based algorithm:
http://www.mathworks.co.uk/help/toolbox/nnet/ug/bss331l-1.html#bss331l-2
this will train your network quickly.
10-12 nodes may not be ideal for your data, you can try changing the number in blocks of 5 or add another layer, in general more layers will improve the ability of your network to classify your problem but will increase the computational complexity and hence slow down the training.
Presumably these 10-12 nodes are 'features' you are trying to classify?
If so, you may wish to normalise them, so rescale each to between 0 and 1 or -1 to 1 depending on your activation function (e.g. tan sigmoidal will produce values in range -1 to +1):
http://www.heatonresearch.com/node/706
You may also train a neural network to identify the ideal number of nodes you should have in your hidden layer.

Resources