I am new to machine learning. I am familiar with SVM , Neural networks and GA. I'd like to know the best technique to learn for classifying pictures and audio. SVM does a decent job but takes a lot of time. Anyone know a faster and better one? Also I'd like to know the fastest library for SVM.
Your question is a good one, and has to do with the state of the art of classification algorithms, as you say, the election of the classifier depends on your data, in the case of images, I can tell you that there is one method called Ada-Boost, read this and this to know more about it, in the other hand, you can find lots of people are doing some researh, for example in Gender Classification of Faces Using Adaboost [Rodrigo Verschae,Javier Ruiz-del-Solar and Mauricio Correa] they say:
"Adaboost-mLBP outperforms all other Adaboost-based methods, as well as baseline methods (SVM, PCA and PCA+SVM)"
Take a look at it.
If your main concern is speed, you should probably take a look at VW and generally at stochastic gradient descent based algorithms for training SVMs.
if the number of features is large in comparison to the number of the trainning examples
then you should go for logistic regression or SVM without kernel
if the number of features is small and the number of training examples is intermediate
then you should use SVN with gaussian kernel
is the number of features is small and the number of training examples is large
use logistic regression or SVM without kernels .
that's according to the stanford ML-class .
For such task you may need to extract features first. Only after that classification is feasible.
I think feature extraction and selection is important.
For image classification, there are a lot of features such as raw pixels, SIFT feature, color, texture,etc. It would be better choose some suitable for your task.
I'm not familiar with audio classication, but there may be some specturm features, like the fourier transform of the signal, MFCC.
The methods used to classify is also important. Besides the methods in the question, KNN is a reasonable choice, too.
Actually, using what feature and method is closely related to the task.
The method mostly depends on problem at hand. There is no method that is always the fastest for any problem. Having said that, you should also keep in mind that once you choose an algorithm for speed, you will start compromising on the accuracy.
For example- since your trying to classify images, there might a lot of features compared to the number of training samples at hand. In such cases, if you go for SVM with kernels, you could end up over fitting with the variance being too high.
So you would want to choose a method that has a high bias and low variance. Using logistic regression or linear SVM are some ways to do it.
You could also use different types of regularizations or techniques such as SVD to remove the features that do not contribute much to your output prediction and have only the most important ones. In other words, choose the features that have little or no correlation between them. Once you do this, you would be able to speed yup your SVM algorithms without sacrificing the accuracy.
Hope it helps.
there are some good techniques in learning machines such as, boosting and adaboost.
One method of classification is the boosting method. This method will iteratively manipulate data which will then be classified by a particular base classifier on each iteration, which in turn will build a classification model. Boosting uses weighting of each data in each iteration where its weight value will change according to the difficulty level of the data to be classified.
While the method adaBoost is one ensamble technique by using loss function exponential function to improve the accuracy of the prediction made.
I think your question is very open ended, and "best classifier for images" will largely depend on the type of image you want to classify. But in general, I suggest you study convulutional neural networks ( CNN ) and transfer learning, currently these are the state of the art techniques for the problem.
check out pre-trained models of cnn based neural networks from pytorch or tensorflow
Related to images I suggest you also study pre-processing of images, pre-processing techniques are very important to highlight some feature of the image and improve the generalization of the classifier.
Related
I built a convolution neural network for image classification that works successfully with large amount of data for each class, but I want to implements it with specific database with limited amount of data available for each class (e.g. may be 1, 2, 3). The accuracy of the same model will be very low in stead of I used data augmentation, batch normalization,and drop out. How can I raise the system accuracy with low amount of data available, is there some model specialized for this case, or any other addition to my system or editing to my image in order to get height evaluated accuracy system. Can anyone please help me, I'm confusing. Thanks...
If you didn't do test with small amount of data you should try it, conv net can work well even with limited amount of data, it's depend how "hard" classification task is.
few option I see with small amount of data:
transfer learning (from you'r network trained with big data base, or for a more real world condition, from DCNN trained by google or other big one, since if you take weight from ur own CNN u'll never know if you could have achieve those performances with just small data base)
If there is some research about ur classification task, find which feature ingeniering people do and apply it. Then try different classifier on extracted feature like SVM,randomforest... Look at ensemble learning and stacking model which are curetly used a lot
ps: for what I know there are 2 option to classify image. Automatic feature extraction which are done by neural network and "manual" feature extraction which can be identified by having a deep knowledge in the field, as a data scientist AND as a profesionnal of the field.
When you have extractd you'r feature you can use different classifier, most of the people which extract feature with conv net use their neural network as classifier
Business case:
Forecasting fuel consumption at site.
Say fuel consumption C, is dependent on various factors x1,x2,...xn. So mathematically speaking, C = F{x1,x2,...xn}. I do not have any equation to put this.
I do have historical dataset from where I can get a correlation of C to x1,x2 .. etc. C,x1,x2,.. are all quantitative. Finding out the correlation seems tough for a person like me with limited statistical knowledge, for a n variable equation.
So, I was thinking of employing some supervised machine learning techniques for the same. I will train a classifier with the historic data to get a prediction for the next consumption.
Question: Am I thinking in the right way?
Question: If this is correct, my system should be an evolving one. So the more real data I am going to feed to the system, that would evolve my model to make a better prediction the next time. Is this a correct understanding?
If the above the statements are true, does the AdaptiveLogisticRegression algorithm, as present in Mahout, will be of help to me?
Requesting advises from the experts here!
Thanks in advance.
Ok, correlation is not a forecasting model. Correlation simply ascribes some relationship between the datasets based on covariance.
In order to develop a forecasting model, what you need to peform is regression.
The simplest form of regression is linear univariate, where C = F (x1). This can easily be done in Excel. However, you state that C is a function of several variables. For this, you can employ linear multivariate regression. There are standard packages that can perform this (within Excel for example), or you can use Matlab, etc.
Now, we are assuming that there is a "linear" relationship between C and the components of X (the input vector). If the relationship were not linear, then you would need more sophisticated methods (nonlinear regression), which may very well employ machine learning methods.
Finally, some series exhibit auto-correlation. If this is the case, then it may be possible for you to ignore the C = F(x1, x2, x3...xn) relationships, and instead directly model the C function itself using time-series techniques such as ARMA and more complex variants.
I hope this helps,
Srikant Krishna
What is difference between SVM and Neural Network?
Is it true that linear svm is same NN, and for non-linear separable problems, NN uses adding hidden layers and SVM uses changing space dimensions?
There are two parts to this question. The first part is "what is the form of function learned by these methods?" For NN and SVM this is typically the same. For example, a single hidden layer neural network uses exactly the same form of model as an SVM. That is:
Given an input vector x, the output is:
output(x) = sum_over_all_i weight_i * nonlinear_function_i(x)
Generally the nonlinear functions will also have some parameters. So these methods need to learn how many nonlinear functions should be used, what their parameters are, and what the value of all the weight_i weights should be.
Therefore, the difference between a SVM and a NN is in how they decide what these parameters should be set to. Usually when someone says they are using a neural network they mean they are trying to find the parameters which minimize the mean squared prediction error with respect to a set of training examples. They will also almost always be using the stochastic gradient descent optimization algorithm to do this. SVM's on the other hand try to minimize both training error and some measure of "hypothesis complexity". So they will find a set of parameters that fits the data but also is "simple" in some sense. You can think of it like Occam's razor for machine learning. The most common optimization algorithm used with SVMs is sequential minimal optimization.
Another big difference between the two methods is that stochastic gradient descent isn't guaranteed to find the optimal set of parameters when used the way NN implementations employ it. However, any decent SVM implementation is going to find the optimal set of parameters. People like to say that neural networks get stuck in a local minima while SVMs don't.
NNs are heuristic, while SVMs are theoretically founded. A SVM is guaranteed to converge towards the best solution in the PAC (probably approximately correct) sense. For example, for two linearly separable classes SVM will draw the separating hyperplane directly halfway between the nearest points of the two classes (these become support vectors). A neural network would draw any line which separates the samples, which is correct for the training set, but might not have the best generalization properties.
So no, even for linearly separable problems NNs and SVMs are not same.
In case of linearly non-separable classes, both SVMs and NNs apply non-linear projection into higher-dimensional space. In the case of NNs this is achieved by introducing additional neurons in the hidden layer(s). For SVMs, a kernel function is used to the same effect. A neat property of the kernel function is that the computational complexity doesn't rise with the number of dimensions, while for NNs it obviously rises with the number of neurons.
Running a simple out-of-the-box comparison between support vector machines and neural networks (WITHOUT any parameter-selection) on several popular regression and classification datasets demonstrates the practical differences: an SVM becomes a very slow predictor if many support vectors are being created while a neural network's prediction speed is much higher and model-size much smaller. On the other hand, the training time is much shorter for SVMs. Concerning the accuracy/loss - despite the aforementioned theoretical drawbacks of neural networks - both methods are on par - especially for regression problems, neural networks often outperform support vector machines. Depending on your specific problem, this might help to choose the right model.
Both Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs) are supervised machine learning classifiers. An ANN is a parametric classifier that uses hyper-parameters tuning during the training phase. An SVM is a non-parametric classifier that finds a linear vector (if a linear kernel is used) to separate classes. Actually, in terms of the model performance, SVMs are sometimes equivalent to a shallow neural network architecture. Generally, an ANN will outperform an SVM when there is a large number of training instances, however, neither outperforms the other over the full range of problems.
We can summarize the advantages of the ANN over the SVM as follows:
ANNs can handle multi-class problems by producing probabilities for each class. In contrast, SVMs handle these problems using independent one-versus-all classifiers where each produces a single binary output. For example, a single ANN can be trained to solve the hand-written digits problem while 10 SVMs (one for each digit) are required.
Another advantage of ANNs, from the perspective of model size, is that the model is fixed in terms of its inputs nodes, hidden layers, and output nodes; in an SVM, however, the number of support vector lines could reach the number of instances in the worst case.
The SVM does not perform well when the number of features is greater than the number of samples. More work in feature engineering is required for an SVM than that needed for a multi-layer Neural Network.
On the other hand, SVMs are better than ANNs in certain respects:
In comparison to SVMs, ANNs are more prone to becoming trapped in local minima, meaning that they sometime miss the global picture.
While most machine learning algorithms can overfit if they don’t have enough training samples, ANNs can also overfit if training goes on for too long - a problem that SVMs do not have.
SVM models are easier to understand. There are different kernels that provide a different level of flexibilities beyond the classical linear kernel, such as the Radial Basis Function kernel (RBF). Unlike the linear kernel, the RBF can handle the case when the relation between class labels and attributes is nonlinear.
SVMs and NNs have the same building block as perceptrons, but SVMs also uses a kernel trick to raise dimension from say 2 to 3d by translation such as Y = (x1,2,..^2, y1,2...^2) which can separate linearly inseparable plains using a straight line. Want a demo like it and ask me :)
Actually, they are exactly equivalent to each other. The only difference is in their standard implementations with selections of activation function and regularization etc, which obviously differ from each other. Also, I have yet not seen a dual formulation for neural networks, but SVMs are moving toward the primal anyway.
Practically, most of your assumption are often quite true. I'll elaborate: for linear separable classes Linear SVM works quite good and and it's much faster to train. For non linear classes there is the kernel trick, which is sending your data to a higher dimension space. This trick however has two disadvantages compared to NN. First - your have to search for the right parameters , because the classifier will only work if in the higher dimension the two sets will be linearly separable. Now - testing parameters is often done by grid search which is CPU-time consuming. The other problem is that this whole technique is not as general as NN (for example, for NLP if often results in poor classifier).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 1 year ago.
Improve this question
We need to decide between Support Vector Machines and Fast Artificial Neural Network for some text processing project.
It includes Contextual Spelling Correction and then tagging the text to certain phrases and their synonyms.
Which will be the right approach? Or is there an alternate to both of these... Something more appropriate than FANN as well as SVM?
I think you'll get a competitive results from both of the algorithms, so you should aggregate the results... think about ensemble learning.
Update:
I don't know if this is specific enough: use Bayes Optimal Classifier to combine the prediction from each algorithm. You have to train both of your algorithms, then you have to train the Bayes Optimal Classifier to use your algorithms and make optimal predictions based on the input of the algorithms.
Separate your training data in 3:
1st data set will be used to train the (Artificial) Neural Network and the Support Vector Machines.
2nd data set will be used to train the Bayes Optimal Classifier by taking the raw predictions from the ANN and SVM.
3rd data set will be your qualification data set where you will test your trained Bayes Optimal Classifier.
Update 2.0:
Another way to create an ensemble of the algorithms is to use 10-fold (or more generally, k-fold) cross-validation:
Break data into 10 sets of size n/10.
Train on 9 datasets and test on 1.
Repeat 10 times and take a mean accuracy.
Remember that you can generally combine many the classifiers and validation methods in order to produce better results. It's just a matter of finding what works best for your domain.
You might want to also take a look at maxent classifiers (/log linear models).
They're really popular for NLP problems. Modern implementations, which use quasi-newton methods for optimization rather than the slower iterative scaling algorithms, train more quickly than SVMs. They also seem to be less sensitive to the exact value of the regularization hyperparameter. You should probably only prefer SVMs over maxent, if you'd like to use a kernel to get feature conjunctions for free.
As for SVMs vs. neural networks, using SVMs would probably be better than using ANNs. Like maxent models, training SVMs is a convex optimization problem. This means, given a data set and a particular classifier configuration, SVMs will consistently find the same solution. When training multilayer neural networks, the system can converge to various local minima. So, you'll get better or worse solutions depending on what weights you use to initialize the model. With ANNs, you'll need to perform multiple training runs in order to evaluate how good or bad a given model configuration is.
This question is very old. Lot of developments were happened in NLP area in last 7 years.
Convolutional_neural_network and Recurrent_neural_network evolved during this time.
Word Embeddings: Words appearing within similar context possess similar meaning. Word embeddings are pre-trained on a task where the objective is to predict a word based on its context.
CNN for NLP:
Sentences are first tokenized into words, which are further transformed into a word embedding matrix (i.e., input embedding layer) of d dimension.
Convolutional filters are applied on this input embedding layer to produce a feature map.
A max-pooling operation on each filter obtain a fixed length output and reduce the dimensionality of the output.
Since CNN had a short-coming of not preserving long-distance contextual information, RNNs have been introduced.
RNNs are specialized neural-based approaches that are effective at processing sequential information.
RNN memorizes the result of previous computations and use it in current computation.
There are few variations in RNN - Long Short Term Memory Unit (LSTM) and Gated recurrent units (GRUs)
Have a look at below resources:
deep-learning-for-nlp
Recent trends in deep learning paper
You can use Convolution Neural Network (CNN) or Recurrent Neural Network (RNN) to train NLP. I think CNN has achieved state-of-the-art now.
I've got a classification problem in my hand, which I'd like to address with a machine learning algorithm ( Bayes, or Markovian probably, the question is independent on the classifier to be used). Given a number of training instances, I'm looking for a way to measure the performance of an implemented classificator, with taking data overfitting problem into account.
That is: given N[1..100] training samples, if I run the training algorithm on every one of the samples, and use this very same samples to measure fitness, it might stuck into a data overfitting problem -the classifier will know the exact answers for the training instances, without having much predictive power, rendering the fitness results useless.
An obvious solution would be seperating the hand-tagged samples into training, and test samples; and I'd like to learn about methods selecting the statistically significant samples for training.
White papers, book pointers, and PDFs much appreciated!
You could use 10-fold Cross-validation for this. I believe it's pretty standard approach for classification algorithm performance evaluation.
The basic idea is to divide your learning samples into 10 subsets. Then use one subset for test data and others for train data. Repeat this for each subset and calculate average performance at the end.
As Mr. Brownstone said 10-fold Cross-Validation is probably the best way to go. I recently had to evaluate the performance of a number of different classifiers for this I used Weka. Which has an API and a load of tools that allow you to easily test the performance of lots of different classifiers.