I hope that someone can help me.
For the solution of an optimisation problem I have to get the maximum of a Matrix containing linear expressions to minimize this value in a second step.
For example I have the unbounded decision variables x and y
x.append(m.addVar(vtype=GRB.CONTINUOUS, lb=-GRB.INFINITY, ub=+GRB.INFINITY, name="x")))
y.append(m.addVar(vtype=GRB.CONTINUOUS, lb=-GRB.INFINITY, ub=+GRB.INFINITY, name="y")))
and the Matrix M = [0.25*x,0.25*x+y].
The maximum of the Matrix should be saved as M_max. Later the objective is to minimize M_max --> m.setObjective( M_max , GRB.MINIMIZE)
When I try it by typing in M_max = amax(M) I always get back the first element, here 0.25x. What operation returns the "real" maximum value? (Of Course my model is more complicated but I hope that you can understand my problem)
Thanks a lot for your help!
The manual approach would be:
introduce aux-var z (-inf, inf, cont)
add constraints
0.25*x <= z
0.25*x+y <= z
minimize (z)
Not sure if gurobi nowadays provide some automatic way.
Edit It seems newer gurobi-versions provide this functionality (of automatic reformulation) like explained here (python-docs; you need to check if those are available for your interface too; which could be python)
max_ ( variables )
Used to set a decision variable equal to the maximum of a list of decision variables (or constants). You can pass the arguments as a Python list or as a comma-separated list.
# example probably based on the assumption:
# import: from gurobipy import *
m.addConstr(z == max_(x, y, 3))
m.addConstr(z == max_([x, y, 3]))
You did not show what amax is you used. If it's numpy's amax or anything outside of gurobi, you cannot use it! Gurobi-vars don't behave as classic fp-variables and every operation on those variable-objects need to be backed by gurobi (often hidden through operator-overloading) or else gurobi can't make sure it's formalizing a valid mathematical-model.
Related
I'm trying to find the (numerical) curvature at specific points. I have data stored in an array, and I essentially want to find the local curvature at every separate point. I've searched around, and found three different implementations for this in MATLAB: diff, gradient, and del2.
If my array's name is arr I have tried the following implementations:
curvature = diff(diff(arr));
curvature = diff(arr,2);
curvature = gradient(gradient(arr));
curvature = del2(arr);
The first two seem to output the same values. This makes sense, because they're essentially the same implementation. However, the gradient and del2 implementations give different values from each other and from diff.
I can't figure out from the documentation precisely how the implementations work. My guess is that some of them are some type of two-sided derivative, and some of them are not two-sided derivatives. Another thing that confuses me is that my current implementations use only the data from arr. arr is my y-axis data, the x-axis essentially being time. Do these functions default to a stepsize of 1 or something like that?
If it helps, I want an implementation that takes the curvature at the current point using only previous array elements. For context, my data is such that a curvature calculation based on data in the future of the current point wouldn't be useful for my purposes.
tl;dr I need a rigorous curvature at a point implementation that uses only data to the left of the point.
Edit: I kind of better understand what's going on based on this, thanks to the answers below. This is what I'm referring to:
gradient calculates the central difference for interior data points.
For example, consider a matrix with unit-spaced data, A, that has
horizontal gradient G = gradient(A). The interior gradient values,
G(:,j), are
G(:,j) = 0.5*(A(:,j+1) - A(:,j-1)); The subscript j varies between 2
and N-1, with N = size(A,2).
Even so, I still want to know how to do a "lefthand" computation.
diff is simply the difference between two adjacent elements in arr, which is exactly why you lose 1 element for using diff once. For example, 10 elements in an array only have 9 differences.
gradient and del2 are for derivatives. Of course, you can use diff to approximate derivative by dividing the difference by the steps. Usually the step is equally-spaced, but it does not have to be. This answers your question why x is not used in the calculation. I mean, it's okay that your x is not uniform-spaced.
So, why gradient gives us an array with the same length of the original array? It is clearly explained in the manual how the boundary is handled,
The gradient values along the edges of the matrix are calculated with single->sided differences, so that
G(:,1) = A(:,2) - A(:,1);
G(:,N) = A(:,N) - A(:,N-1);
Double-gradient and del2 are not necessarily the same, although they are highly correlated. It's all because how you calculate/approximate the 2nd-oder derivatives. The difference is, the former approximates the 2nd derivative by doing 1st derivative twice and the latter directly approximates the 2nd derivative. Please refer to the help manual, the formula are documented.
Okay, do you really want curvature or the 2nd derivative for each point on arr? They are very different. https://en.wikipedia.org/wiki/Curvature#Precise_definition
You can use diff to get the 2nd derivative from the left. Since diff takes the difference from right to left, e.g. x(2)-x(1), you can first flip x from left to right, then use diff. Some codes like,
x=fliplr(x)
first=x./h
second=diff(first)./h
where h is the space between x. Notice I use ./, which idicates that h can be an array (i.e. non-uniform spaced).
The Matlab function bvp4c solves boundary value problems. It takes a differential equation, boundary conditions and an initial guess as input, and returns a structure array containing arrays of x, y and yp (which stands for "y prime", or y').
The length of the output arrays should be the same as that of the initial guess, but I found that it isn't always. I have checked the dimensions of the input (the initial guess, always 1x101 double for x and 16x101 double for y) and the output (sometimes 1x101 double for x and 16x101 double for y and yp as it should be, but often different values, such as 1x91 double and 16x91 double or 1x175 double and 16x175 double).
Looking at the output array x when its length is off, some extra values are squeezed in, or some are taken out. For example, the initial guess has 100 positions between x=0 and x=1, and the x array should be [0 0.01 0.02 ... 1], but sometimes a new position like 0.015 shows up.
Question: Why does this happen, and how can this be solved?
"The length of the output arrays should be the same as that of the initial guess ...." This is incorrect.
As described in the bvp4c documentation, sol.x contains a "[mesh] selected by bvp4c" with an "[approximation] to y(x) at the mesh points of sol.x". In order to evaluate bvp4c's solution on your mesh, use deval.
Why does bvp4c choose a mesh? Quoting from the cited paper1, which you can get in full here if you have a MathWorks account:
Because BVPs can have more than one solution, BVP codes require users to supply a guess for the solution desired. The guess includes a guess for an initial mesh that reveals the behavior of the desired solution. The codes then adapt the mesh so as to obtain an accurate numerical solution with a modest number of mesh points.
Because a steady BVP generally has a global behavior strongly dependent on its boundary values, the spatial mesh between the two boundaries may need to be refined in order to properly approximate the desired solution with the locally chosen basis functions for the method. However, there may also be portions of the mesh that do not need to be refined and can even be coarsened in some cases to maintain a reasonably small residual and accurate approximation. Therefore, for general efficiency, the guess mesh is adaptively refined or coarsened depending on some locally chosen metric (since bvp4c is collocation based, the metric is probably point-based or division-integrated based) such that the mesh returned by bvp4c is, in some sense, adequate enough for generic interpolation within the boundaries.
I'll also note that this is different from numerically solving IVPs since their state is not global across the entire time integration locus and only depends on the current state to the next time-step, and possibly previous time steps if using a multi-step method or solving a delay differential equation, which makes the refinement inherently local. This local behavior of IVPs is what allows functions like ode45 to return a solution at pre-selected time values because it can locally refine the solution at the selected point while performing the time march (this is known as dense output).
1 Shampine, L.F., M.W. Reichelt, and J. Kierzenka, "Solving Boundary Value Problems for Ordinary Differential Equations in MATLAB with bvp4c".
I have a simple machine learning question:
I have n (~110) elements, and a matrix of all the pairwise distances. I would like to choose the 10 elements that are most far apart. That is, I want to
Maximize:
Choose 10 different elements.
Return min distance over (all pairings within the 10).
My distance metric is symmetric and respects the triangle inequality.
What kind of algorithm can I use? My first instinct is to do the following:
Cluster the n elements into 20
clusters.
Replace each cluster with just the
element of that cluster that is
furthest from the mean element of
the original n.
Use brute force to solve the
problem on the remaining 20
candidates. Luckily, 20 choose 10 is
only 184,756.
Edit: thanks to etarion's insightful comment, changed "Return sum of (distances)" to "Return min distance" in the optimization problem statement.
Here's how you might approach this combinatorial optimization problem by taking the convex relaxation.
Let D be an upper triangular matrix with your distances on the upper triangle. I.e. where i < j, D_i,j is the distance between elements i and j. (Presumably, you'll have zeros on the diagonal, as well.)
Then your objective is to maximize x'*D*x, where x is binary valued with 10 elements set to 1 and the rest to 0. (Setting the ith entry in x to 1 is analogous to selecting the ith element as one of your 10 elements.)
The "standard" convex optimization thing to do with a combinatorial problem like this is to relax the constraints such that x need not be discrete valued. Doing so gives us the following problem:
maximize y'*D*y
subject to: 0 <= y_i <= 1 for all i, 1'*y = 10
This is (morally) a quadratic program. (If we replace D with D + D', it'll become a bona fide quadratic program and the y you get out should be no different.) You can use an off-the-shelf QP solver, or just plug it in to the convex optimization solver of your choice (e.g. cvx).
The y you get out need not be (and probably won't be) a binary vector, but you can convert the scalar values to discrete ones in a bunch of ways. (The simplest is probably to let x be 1 in the 10 entries where y_i is highest, but you might need to do something a little more complicated.) In any case, y'*D*y with the y you get out does give you an upper bound for the optimal value of x'*D*x, so if the x you construct from y has x'*D*x very close to y'*D*y, you can be pretty happy with your approximation.
Let me know if any of this is unclear, notation or otherwise.
Nice question.
I'm not sure if it can be solved exactly in an efficient manner, and your clustering based solution seems reasonable. Another direction to look at would be local search method such as simulated annealing and hill climbing.
Here's an obvious baseline I would compare any other solution against:
Repeat 100 times:
Greedily select the datapoint that whose removal decreases the objective function the least and remove it.
I have a system that stores vectors and allows a user to find the n most similar vectors to the user's query vector. That is, a user submits a vector (I call it a query vector) and my system spits out "here are the n most similar vectors." I generate the similar vectors using a KD-Tree and everything works well, but I want to do more. I want to present a list of the n most similar vectors even if the user doesn't submit a complete vector (a vector with missing values). That is, if a user submits a vector with three dimensions, I still want to find the n nearest vectors (stored vectors are of 11 dimensions) I have stored.
I have a couple of obvious solutions, but I'm not sure either one seem very good:
Create multiple KD-Trees each built using the most popular subset of dimensions a user will search for. That is, if a user submits a query vector of thee dimensions, x, y, z, I match that query to my already built KD-Tree which only contains vectors of three dimensions, x, y, z.
Ignore KD-Trees when a user submits a query vector with missing values and compare the query vector to the vectors (stored in a table in a DB) one by one using something like a dot product.
This has to be a common problem, any suggestions? Thanks for the help.
Your first solution might be fastest for queries (since the tree-building doesn't consider splits in directions that you don't care about), but it would definitely use a lot of memory. And if you have to rebuild the trees repeatedly, it could get slow.
The second option looks very slow unless you only have a few points. And if that's the case, you probably didn't need a kd-tree in the first place :)
I think the best solution involves getting your hands dirty in the code that you're working with. Presumably the nearest-neighbor search computes the distance between the point in the tree leaf and the query vector; you should be able to modify this to handle the case where the point and the query vector are different sizes. E.g. if the points in the tree are given in 3D, but your query vector is only length 2, then the "distance" between the point (p0, p1, p2) and the query vector (x0, x1) would be
sqrt( (p0-x0)^2 + (p1-x1)^2 )
I didn't dig into the java code that you linked to, but I can try to find exactly where the change would need to go if you need help.
-Chris
PS - you might not need the sqrt in the equation above, since distance squared is usually equivalent.
EDIT
Sorry, didn't realize it would be so obvious in the source code. You should use this version of the neighbor function:
nearest(double [] key, int n, Checker<T> checker)
And implement your own Checker class; see their EuclideanDistance.java to see the Euclidean version. You may also need to comment out any KeySizeException that the query code throws, since you know that you can handle differently sized keys.
Your second option looks like a reasonable solution for what you want.
You could also populate the missing dimensions with the most important( or average or whatever you think it should be) values if there are any.
You could try using the existing KD tree -- by taking both branches when the split is for a dimension that is not supplied by the source vector. This should take less time than doing a brute force search, and might be less trouble than trying to maintain a bunch of specialized trees for dimension subsets.
You would need to adapt your N-closest algorithm (without more info I can't advise you on that...), and for distance you would use the sum of the squares of only those elements supplied by the source vector.
Here's what I ended up doing: When a user didn't specify a value (when their query vector lacked a dimension), I I simply adjusted my matching range (in the API) to something huge so that I match any value.
I have an array (arr) of elements, and a function (f) that takes 2 elements and returns a number.
I need a permutation of the array, such that f(arr[i], arr[i+1]) is as little as possible for each i in arr. (and it should loop, ie. it should also minimize f(arr[arr.length - 1], arr[0]))
Also, f works sort of like a distance, so f(a,b) == f(b,a)
I don't need the optimum solution if it's too inefficient, but one that works reasonable well and is fast since I need to calculate them pretty much in realtime (I don't know what to length of arr is, but I think it could be something around 30)
What does "such that f(arr[i], arr[i+1]) is as little as possible for each i in arr" mean? Do you want minimize the sum? Do you want to minimize the largest of those? Do you want to minimize f(arr[0],arr[1]) first, then among all solutions that minimize this, pick the one that minimizes f(arr[1],arr[2]), etc., and so on?
If you want to minimize the sum, this is exactly the Traveling Salesman Problem in its full generality (well, "metric TSP", maybe, if your f's indeed form a metric). There are clever optimizations to the naive solution that will give you the exact optimum and run in reasonable time for about n=30; you could use one of those, or one of the heuristics that give you approximations.
If you want to minimize the maximum, it is a simpler problem although still NP-hard: you can do binary search on the answer; for a particular value d, draw edges for pairs which have f(x,y)
If you want to minimize it lexiocographically, it's trivial: pick the pair with the shortest distance and put it as arr[0],arr[1], then pick arr[2] that is closest to arr[1], and so on.
Depending on where your f(,)s are coming from, this might be a much easier problem than TSP; it would be useful for you to mention that as well.
You're not entirely clear what you're optimizing - the sum of the f(a[i],a[i+1]) values, the max of them, or something else?
In any event, with your speed limitations, greedy is probably your best bet - pick an element to make a[0] (it doesn't matter which due to the wraparound), then choose each successive element a[i+1] to be the one that minimizes f(a[i],a[i+1]).
That's going to be O(n^2), but with 30 items, unless this is in an inner loop or something that will be fine. If your f() really is associative and commutative, then you might be able to do it in O(n log n). Clearly no faster by reduction to sorting.
I don't think the problem is well-defined in this form:
Let's instead define n fcns g_i : Perms -> Reals
g_i(p) = f(a^p[i], a^p[i+1]), and wrap around when i+1 > n
To say you want to minimize f over all permutations really implies you can pick a value of i and minimize g_i over all permutations, but for any p which minimizes g_i, a related but different permatation minimizes g_j (just conjugate the permutation). So therefore it makes no sense to speak minimizing f over permutations for each i.
Unless we know something more about the structure of f(x,y) this is an NP-hard problem. Given a graph G and any vertices x,y let f(x,y) be 1 if there is no edge and 0 if there is an edge. What the problem asks is an ordering of the vertices so that the maximum f(arr[i],arr[i+1]) value is minimized. Since for this function it can only be 0 or 1, returning a 0 is equivalent to finding a Hamiltonian path in G and 1 is saying that no such path exists.
The function would have to have some sort of structure that disallows this example for it to be tractable.