How do I modify the coefficients of a logistic regression model in Weka? - logistic-regression

I have previously trained a logistic regression classifier on the Iris data set, and saved the resulting model to a file named iris.model.
I now load the model into the Weka Explorer:
How do I edit the coefficients of this model? For example, I want to change Iris-setosa's sepallength coefficient from 21.8065 to 19.

You can't. Weka's classifiers are data driven and don't offer post-build fine-tuning or manual modifications.

Related

Binomial logistic regression or binomial GAM?

I have presence/absence data of a species as my dependent variable and climatic data for my independent variables.
The present samples are found within a particular range with the absent data generally either side.
To run a regression is it best to use a binomial logistic regression or binomial generalized additive model?

multiple exponential and logarithmic regression (PYTHON)

I want to make data analysis. So, i searched and chose a dataset about automotive. Dataset includes 15 columns and 7500 rows. I used linear regression model(multiple) but now, i want to try another regression models like exponential and logarithmic. But i dont know how can i apply for 15 columns.
Can you lead about that ? Anyone has any idea? Firstable, i'll focus on the exponential regression.
Maybe, you can suggest a link or book or article about nonlinear regression models. (searched but i did not find exactly what i wanted)
Thank you for your interest.

Use case for incremental supervised learning using apache mahout

Business case:
Forecasting fuel consumption at site.
Say fuel consumption C, is dependent on various factors x1,x2,...xn. So mathematically speaking, C = F{x1,x2,...xn}. I do not have any equation to put this.
I do have historical dataset from where I can get a correlation of C to x1,x2 .. etc. C,x1,x2,.. are all quantitative. Finding out the correlation seems tough for a person like me with limited statistical knowledge, for a n variable equation.
So, I was thinking of employing some supervised machine learning techniques for the same. I will train a classifier with the historic data to get a prediction for the next consumption.
Question: Am I thinking in the right way?
Question: If this is correct, my system should be an evolving one. So the more real data I am going to feed to the system, that would evolve my model to make a better prediction the next time. Is this a correct understanding?
If the above the statements are true, does the AdaptiveLogisticRegression algorithm, as present in Mahout, will be of help to me?
Requesting advises from the experts here!
Thanks in advance.
Ok, correlation is not a forecasting model. Correlation simply ascribes some relationship between the datasets based on covariance.
In order to develop a forecasting model, what you need to peform is regression.
The simplest form of regression is linear univariate, where C = F (x1). This can easily be done in Excel. However, you state that C is a function of several variables. For this, you can employ linear multivariate regression. There are standard packages that can perform this (within Excel for example), or you can use Matlab, etc.
Now, we are assuming that there is a "linear" relationship between C and the components of X (the input vector). If the relationship were not linear, then you would need more sophisticated methods (nonlinear regression), which may very well employ machine learning methods.
Finally, some series exhibit auto-correlation. If this is the case, then it may be possible for you to ignore the C = F(x1, x2, x3...xn) relationships, and instead directly model the C function itself using time-series techniques such as ARMA and more complex variants.
I hope this helps,
Srikant Krishna

How to use Weka for predicting results

I'm new to Weka and I'm confused with the tool. I have a data set about fruit prices and related attributes. I'm trying to predict the specific fruit price using the data set. Since I'm new to Weka, I couldn't figure out how to do this task. Please help me or guide me to a tutorial about how to do predictions, and what is the best method or algorithm for this task.
If you want to know more about saving a trained classifier and loading it later to predict, please refer to the following.
With the assumption that you want to use the Weka GUI, you have to go through these two steps:
First, use some pre-labelled data to train a classifier (use your fruit prices data). Make sure the data is in ARFF format. After training, save the model to your disk.
More on this can be found here: https://waikato.github.io/weka-wiki/saving_and_loading_models/
In the second step, you use the already trained model (done in step 1). Specifically, you have to load the model file (saved in step 1) and then use the 'supplied test set" option on the "Classifiers" tab. In the "supplied test set" option, select the un-labelled data.
More on this can be found here: https://waikato.github.io/weka-wiki/making_predictions/
I would suggest first playing around with the ARFF data files that come with your Weka install (these ARFF files are basically sitting under your Weka install directory. In my case it is under: C:\Program Files\Weka-3-7\data).
Some more useful URLs:
https://developer.ibm.com/technologies/analytics/articles/os-weka1/
http://ortho.clmed.ncku.edu.tw/~emba/2006EMBA_MIS/3_16_2006/WekaIntro.pdf
Hope that helps.
I think the best step by step method for getting predictions from an existing trained model can be found here - https://machinelearningmastery.com/save-machine-learning-model-make-predictions-weka/

Regression Model for categorical data

I have very large dataset in csv file (1,700,000 raws and 300 sparse features).
- It has a lot of missing values.
- the data varies between numeric and categoral values.
- the dependant variable (the class) is binary (either 1 or 0).
- the data is highly skewed, the number of positive response is low.
Now what is required from me is to apply regression model and any other machine learning algorithm on this data.
I'm new on this and I need help..
-how to deal with categoral data in case of regression model? and does the missing values affects too much on it?
- what is the best prediction model i can try for large, sparse, skewed data like this?
- what program u advice me to work with? I tried Weka but it can't even open that much of data (memory failure). I know that matlab can open either numeric csv or categories csv not mixed, beside the missing values has to be imputed to allow it to open the file. I know a little bit of R.
I'm trying to manipulate the data using excel, access and perl script. and that's really hard with that amount of data. excel can't open more than almost 1M record and access can't open more than 255 columns. any suggestion.
Thank you for help in advance
First of all, you are talking about classification, not regression - classification allows to predict value from the fixed set (e.g. 0 or 1) while regression produces real numeric output (e.g. 0, 0.5, 10.1543, etc.). Also don't be confused with so called logistic regression - it is classifier too, and its name just shows that it is based on linear regression.
To process such a large amount of data you need inductive (updatable) model. In particular, in Weka there's a number of such algorithms under classification section (e.g. Naive Bayes Updatable, Neutral Networks Updatable and others). With inductive model you will be able to load data portion by portion and update model in appropriate way (for Weka see Knowledge Flow interface for details of how to use it easier).
Some classifiers may work with categorical data, but I can't remember any updatable from them, so most probably you still need to transform categorical data to numeric. Standard solution here is to use indicator attributes, i.e. substitute every categorical attribute with several binary indicator. E.g. if you have attribute day-of-week with 7 possible values you may substitute it with 7 binary attributes - Sunday, Monday, etc. Of course, in each particular instance only one of 7 attributes may hold value 1 and all others have to be 0.
Importance of missing values depend on the nature of your data. Sometimes it worth to replace them with some neutral value beforehand, sometimes classifier implementation does it itself (check manuals for an algorithm for details).
And, finally, for highly skewed data use F1 (or just Precision / Recall) measure instead of accuracy.

Resources