I have the following clingo code that generates the search space, followed by constraints.
{in(I,1..4)}=1 :- I=1..n.
:- [constraint1]
:- [constraint2]
This code works. But I need clingo to find the largest value of n for which a stable model exists. What is the best way to do that?
A little bit more performant variant should be:
value(I) :- in(I,_).
value(I-1) :- value(I), I > 0.
#maximize {1,I : value(I)}.
You can use the #min aggregate to find min n.
value(I) :- I = #min {I:in(I,X) }.
and use #maximize directive to find stable models in which the value of aggregate experission is larger.
#maximize {I: value(I)}.
Related
I'm having trouble to give the condition for tables' joining. The highlight parts are the 3 conditions that I need to solve. Basically, there are some securities that for their effective term if the value is between 0 to 2 it has score 1, if the value is between 2 to 10, it has score 2, and if the value is bigger than 10 it has value 4.
For the first two conditions, in the query's where part I solve them like this
however for the third condition if the Descriptsec is empty I'm not quite sure what can I do, can anyone help?
Can you change the lookup table ([Risk].[dbo].[FILiquidityBuckets]) you are using?
If yes, do this:
Add bounds so that table looks like this:
Metric-DescriptLowerBound-DescriptUpperBound-LiquidityScore
Effective term-0-2-1
Effective term-2-10-2
Effective term-10-9999999(some absurd high number)-4
Then your join condition can be this:
ON FB3.Metric='Effective term'
AND CAST(sa.effectiveTerm AS INT) BETWEEN CAST(FB3.DescriptLowerBound AS INT)
AND CAST(FB3.DescriptLowerBound AS INT)
Please note that BETWEEN is inclusive so in the edge cases (where the value is exactly 2 or 10), the lower score will be captured.
I can see some problems: the effective term in table with sa alias is a float. So you should consider rounding up or down.
Overall a lot of things can be changed/improved but I tried to offer an immediate solution.
Hope this helps.
We know that the work flow of logistic regression is it first gets the probability based on some equations and uses default cut-off for classification.
So, I want to know if it is possible to change the default cutoff value(0.5) to 0.75 as per my requirement. If Yes, can someone help me with the code either in R or Python or SAS. If No, can someone provide if with relevant proofs.
In my process of finding the answer for this query, i found that :-
1.) We can find the optimal cutoff value that can give best possible accuracy and build the confusion matrix accordingly :-
R code to find optimul cutoff and build confusion matrix :-
library(InformationValue)
optCutOff <- optimalCutoff(testData$ABOVE50K, predicted)[1]
confusionMatrix(testData$ABOVE50K, predicted, threshold = optCutOff)
Misclassification Error :-
misClassError(testData$ABOVE50K, predicted, threshold = optCutOff)
Note :- We see that the cutoff value is changed while calculating the confusion matrix, but not while building the model. Can someone help me with this.
Reference link :- http://r-statistics.co/Logistic-Regression-With-R.html
from sklearn.linear_model import LogisticRegression
lr=LogisticRegression()
lr.fit(x_train, y_train)
we find first use
lr.predict_proba(x_test)
to get the probability in each class, for example, first column is probability of y=0 and second column is probability of y=1.
# the probability of being y=1
prob1=lr.predict_proba(X_test)[:,1]
If we use 0.25 as the cutoff value, then we predict like below
predicted=[1 if i > 0.25 else 0 for i in prob1]
I hope that someone can help me.
For the solution of an optimisation problem I have to get the maximum of a Matrix containing linear expressions to minimize this value in a second step.
For example I have the unbounded decision variables x and y
x.append(m.addVar(vtype=GRB.CONTINUOUS, lb=-GRB.INFINITY, ub=+GRB.INFINITY, name="x")))
y.append(m.addVar(vtype=GRB.CONTINUOUS, lb=-GRB.INFINITY, ub=+GRB.INFINITY, name="y")))
and the Matrix M = [0.25*x,0.25*x+y].
The maximum of the Matrix should be saved as M_max. Later the objective is to minimize M_max --> m.setObjective( M_max , GRB.MINIMIZE)
When I try it by typing in M_max = amax(M) I always get back the first element, here 0.25x. What operation returns the "real" maximum value? (Of Course my model is more complicated but I hope that you can understand my problem)
Thanks a lot for your help!
The manual approach would be:
introduce aux-var z (-inf, inf, cont)
add constraints
0.25*x <= z
0.25*x+y <= z
minimize (z)
Not sure if gurobi nowadays provide some automatic way.
Edit It seems newer gurobi-versions provide this functionality (of automatic reformulation) like explained here (python-docs; you need to check if those are available for your interface too; which could be python)
max_ ( variables )
Used to set a decision variable equal to the maximum of a list of decision variables (or constants). You can pass the arguments as a Python list or as a comma-separated list.
# example probably based on the assumption:
# import: from gurobipy import *
m.addConstr(z == max_(x, y, 3))
m.addConstr(z == max_([x, y, 3]))
You did not show what amax is you used. If it's numpy's amax or anything outside of gurobi, you cannot use it! Gurobi-vars don't behave as classic fp-variables and every operation on those variable-objects need to be backed by gurobi (often hidden through operator-overloading) or else gurobi can't make sure it's formalizing a valid mathematical-model.
I have 2 arrays: the first array contains areas of flats and the second its prices. The values of arrays form a chart and will be used to calculate results of a cost function. The main task is to find the best parameter of the cost function to minimize its result. This is how the cost function looks like:
It was suggested creating a loop from 1 to 10 000 and find the best parameter that has less result. The complexity of this algorithm is 10 000 * size of the arrays.
I proposed an idea to calculate differences between corresponding elements of the arrays and put results into an array. Then find an average of all elements of this array. The obtained average value is the parameter which should provide a better result for our cost function. The algorithm is much more efficient than previous one and can provide more accurate results.
I am wondering whether my algorithm is applicable or not?
The cost function that you're proposing is the mean squared error of fitting a linear function to a collection of data points. This is a well-studied problem, and in fact there's a closed-form solution that will tell you the mathematically optimal value of a that you should pick. In that sense, I would recommend not using either of the solutions that are proposed here and to instead just solve things directly.
The cost function you have is a function purely of the variable a, so taking the derivative with respect to a, setting that derivative to zero, and solving should give you the optimal choice of a.
Cost(a) = (1 / 2m) Σi=0(axi - yi)2
Cost'(a) = (1 / 2m) Σi=02(axi - yi)xi
Cost'(a) = (1 / 2m) Σi=0(2axi2 - 2xiyi)
Setting this expression to 0 and simplifying tells us that
0 = (1 / 2m) Σi=0(2axi2 - 2xiyi)
0 = Σi=0(2axi2 - 2xiyi)
0 = 2a Σi=0xi2 - 2Σi=0xiyi
a Σi=0xi2 = Σi=0xiyi
a = (Σi=0xiyi) / (Σi=0xi2)
You should be able to compute this pretty easily in time O(n) by making a single pass over the array and computing the numerator and denominator
I am working on solr for 3-4 months. I want to know if it is possible to query on solr with following requirements.
return all the documents where,
fieldName1 = queryTerm1 &
strdist(queryTerm2, fieldName2, JW) > 5 (or some constant)
If this is possible, what will be the query?
I guess you can get close.
Sort the results on string distance (split for easier):
localhost:8983/solr/select/?fl=id
&q=fieldName1:queryTerm1
&sort=strdist("queryTerm2",fieldName2, JW) desc
which will order the results, highest string distance downwards.
Note that you cannot directly get the string distance. There is a pseudo-field score, retrieved by:
fl=id,score
but it means nothing in an absolute sense.
You can also boost results based on the string distance, instead of simply sorting them. In this case, it will look at the relevancy of the document as well as the string distance.
Once you have a sorted list (hope its not too large!), you can determine client-side the elements which have 'string distance < 5'.
I made this up from the links below.
http://yonik.wordpress.com/2011/03/10/solr-relevancy-function-queries/
http://wiki.apache.org/solr/FunctionQuery#strdist
as far as i know, it's not possible