Is any way to calculate AUC for past results - auc

I fit a ML algorithm ( binary classification ) , and I really calculate all metrics that I found
and I write the 10 important features in each run ( my run = 5 times )
the problem now:
I need to calculate AUC for each run from 5 runs
Is there any way to calculate AUC from my privious run ?
without to return all calculation and what I did it before
I pray there is a way to do this
help me please

Related

calculate the average of the AUC and plot

I write a ML code and I run it 5 times ,
each time I calculte Area under the curve (AUC)
Now, I want to calculate the average of this five AUC value ..
How can I do it? just summing and devided by 5?
Also, is there any way to draw the average of the area under the curve ( in python )?
Thanks

Can we change the default Cut-off(0.5) taken by Logistic Regression and not while calculating the classification error

We know that the work flow of logistic regression is it first gets the probability based on some equations and uses default cut-off for classification.
So, I want to know if it is possible to change the default cutoff value(0.5) to 0.75 as per my requirement. If Yes, can someone help me with the code either in R or Python or SAS. If No, can someone provide if with relevant proofs.
In my process of finding the answer for this query, i found that :-
1.) We can find the optimal cutoff value that can give best possible accuracy and build the confusion matrix accordingly :-
R code to find optimul cutoff and build confusion matrix :-
library(InformationValue)
optCutOff <- optimalCutoff(testData$ABOVE50K, predicted)[1]
confusionMatrix(testData$ABOVE50K, predicted, threshold = optCutOff)
Misclassification Error :-
misClassError(testData$ABOVE50K, predicted, threshold = optCutOff)
Note :- We see that the cutoff value is changed while calculating the confusion matrix, but not while building the model. Can someone help me with this.
Reference link :- http://r-statistics.co/Logistic-Regression-With-R.html
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()
lr.fit(x_train, y_train)
we find first use
lr.predict_proba(x_test)
to get the probability in each class, for example, first column is probability of y=0 and second column is probability of y=1.
# the probability of being y=1
prob1=lr.predict_proba(X_test)[:,1]
If we use 0.25 as the cutoff value, then we predict like below
predicted=[1 if i > 0.25 else 0 for i in prob1]

Store outputs of forecast

I'm using a forecast() function in R many times with loop (12 months) for but I want to use accuracy to compare forecast for horizon time =12 and one-step ahead. My problem is how to store the results of 12 times to use it in accuracy.
Thank you

Multiple combinations (ex drug-ADR) with the same unique case ID

I am quite new to R statistics, and I one you can help me. I have tried finding the answer to my question by searching the forum and so on, and I apologize in advance if my question is trivial or stupid.
I have spent the last month collecting my first data set. And my dataset is now ready to be analyzed. I have spent some time learning the most basic function of the R statistics.
My dataset deals with adverse drug reaction reports. Each report may contain several suspect drugs and several adverse reactions. A case can therefore contain several drugs and adverse reaction (drug-ADR) combinations. Some cases contain just one combination and others contain several.
And now my question is: How do I make calculations that are “case-specific”?
I want to calculate a Completeness Score for the percentage of completed data fields for each drug-ADR combination, and then I would like to calculate the average for the entire case/report.
I want to calculate a Completness Score (C) for each drug-ADR combination expressed as:
C = (1-Pi) = (1-P1) x (1-P 2) x (1-P3) …. (1-Pn)
, where Pi refers to the penalty deducted, if the data field is not complete (ex 0.50 for 50%). If the information is not missing the panalty 0. The max score will then be 1. n is the number of parameters / variables.
Ultimately I want to calculate an overall Completness score for the overall case/report. The total score is should be calculated from the average of each drug-ADR combination.
C = Cj / m
, where j denotes the current drug-ADR combination, and m is the total number of combinations of drug-ADR in the full report.
Can anyone help me?
Thanke you for your attention!! I will be very grateful for any help that I can get.

Calculating Moving Range in SQL Server (without arrays)

I have a requirement to calculate the Moving Range of a load of data (at least I think this is what it is called) in SQL Server. This would be easy if I could use arrays, but I understand this is not possible for MS SQL, so wonder if anyone had a suggestion.
To give you an idea of what I need:
Lets say I have the following in a sql server table:
1
3
2
6
3
I need to get the difference of each of these numbers (in order), ie:
|1-3|=2
|3-2|=1
|6-2|=4
|3-6|=3
Then square these:
2^2=4
1^2=1
4^2=16
3^2=9
EDIT: PROBABLY WORTH NOTING THAT YOU DO NOT SQUARE THESE FOR MOVING AVERAGE - I WAS WRONG
Then sum them:
4+1+16+9=30
Then divide by number of values:
30/5=6
Then square root this:
2.5(ish)
EDIT: BECAUSE YOU ARENT SQUARING THEM, YOU ARENT SQROOTING THEM EITHER
If anyone can just help me out with the first step, that would be great - I can do the rest myself.
A few other things to take into account:
- Using stored procedure in SQL Server
- There is quite a lot of data (100s or 1000s of values), and they will need to be calulated daily or weekly
Many thanks in advance.
~Bob
WITH nums AS
(
SELECT num, ROW_NUMBER() OVER (ORDER BY id) AS rn
FROM mytable
)
SELECT SQRT(AVG(POWER(tp.num - tf.num, 2)))
FROM nums tp
JOIN nums tf
ON tf.rn = tp.rn + 1

Resources