I fit a ML algorithm ( binary classification ) , and I really calculate all metrics that I found
and I write the 10 important features in each run ( my run = 5 times )
the problem now:
I need to calculate AUC for each run from 5 runs
Is there any way to calculate AUC from my privious run ?
without to return all calculation and what I did it before
I pray there is a way to do this
help me please
Related
I write a ML code and I run it 5 times ,
each time I calculte Area under the curve (AUC)
Now, I want to calculate the average of this five AUC value ..
How can I do it? just summing and devided by 5?
Also, is there any way to draw the average of the area under the curve ( in python )?
Thanks
We know that the work flow of logistic regression is it first gets the probability based on some equations and uses default cut-off for classification.
So, I want to know if it is possible to change the default cutoff value(0.5) to 0.75 as per my requirement. If Yes, can someone help me with the code either in R or Python or SAS. If No, can someone provide if with relevant proofs.
In my process of finding the answer for this query, i found that :-
1.) We can find the optimal cutoff value that can give best possible accuracy and build the confusion matrix accordingly :-
R code to find optimul cutoff and build confusion matrix :-
library(InformationValue)
optCutOff <- optimalCutoff(testData$ABOVE50K, predicted)[1]
confusionMatrix(testData$ABOVE50K, predicted, threshold = optCutOff)
Misclassification Error :-
misClassError(testData$ABOVE50K, predicted, threshold = optCutOff)
Note :- We see that the cutoff value is changed while calculating the confusion matrix, but not while building the model. Can someone help me with this.
Reference link :- http://r-statistics.co/Logistic-Regression-With-R.html
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()
lr.fit(x_train, y_train)
we find first use
lr.predict_proba(x_test)
to get the probability in each class, for example, first column is probability of y=0 and second column is probability of y=1.
# the probability of being y=1
prob1=lr.predict_proba(X_test)[:,1]
If we use 0.25 as the cutoff value, then we predict like below
predicted=[1 if i > 0.25 else 0 for i in prob1]
I'm using a forecast() function in R many times with loop (12 months) for but I want to use accuracy to compare forecast for horizon time =12 and one-step ahead. My problem is how to store the results of 12 times to use it in accuracy.
Thank you
I am quite new to R statistics, and I one you can help me. I have tried finding the answer to my question by searching the forum and so on, and I apologize in advance if my question is trivial or stupid.
I have spent the last month collecting my first data set. And my dataset is now ready to be analyzed. I have spent some time learning the most basic function of the R statistics.
My dataset deals with adverse drug reaction reports. Each report may contain several suspect drugs and several adverse reactions. A case can therefore contain several drugs and adverse reaction (drug-ADR) combinations. Some cases contain just one combination and others contain several.
And now my question is: How do I make calculations that are “case-specific”?
I want to calculate a Completeness Score for the percentage of completed data fields for each drug-ADR combination, and then I would like to calculate the average for the entire case/report.
I want to calculate a Completness Score (C) for each drug-ADR combination expressed as:
C = (1-Pi) = (1-P1) x (1-P 2) x (1-P3) …. (1-Pn)
, where Pi refers to the penalty deducted, if the data field is not complete (ex 0.50 for 50%). If the information is not missing the panalty 0. The max score will then be 1. n is the number of parameters / variables.
Ultimately I want to calculate an overall Completness score for the overall case/report. The total score is should be calculated from the average of each drug-ADR combination.
C = Cj / m
, where j denotes the current drug-ADR combination, and m is the total number of combinations of drug-ADR in the full report.
Can anyone help me?
Thanke you for your attention!! I will be very grateful for any help that I can get.
I have a requirement to calculate the Moving Range of a load of data (at least I think this is what it is called) in SQL Server. This would be easy if I could use arrays, but I understand this is not possible for MS SQL, so wonder if anyone had a suggestion.
To give you an idea of what I need:
Lets say I have the following in a sql server table:
1
3
2
6
3
I need to get the difference of each of these numbers (in order), ie:
|1-3|=2
|3-2|=1
|6-2|=4
|3-6|=3
Then square these:
2^2=4
1^2=1
4^2=16
3^2=9
EDIT: PROBABLY WORTH NOTING THAT YOU DO NOT SQUARE THESE FOR MOVING AVERAGE - I WAS WRONG
Then sum them:
4+1+16+9=30
Then divide by number of values:
30/5=6
Then square root this:
2.5(ish)
EDIT: BECAUSE YOU ARENT SQUARING THEM, YOU ARENT SQROOTING THEM EITHER
If anyone can just help me out with the first step, that would be great - I can do the rest myself.
A few other things to take into account:
- Using stored procedure in SQL Server
- There is quite a lot of data (100s or 1000s of values), and they will need to be calulated daily or weekly
Many thanks in advance.
~Bob
WITH nums AS
(
SELECT num, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM mytable
)
SELECT SQRT(AVG(POWER(tp.num - tf.num, 2)))
FROM nums tp
JOIN nums tf
ON tf.rn = tp.rn + 1