I'm looking for some advice on how to go about implementing Gradient (steepest) Descent in C. I am finding the minimum of f(x)=||Ax-y||^2, with A(n,n) and y(n) given.
This is difficult in C (I think) because computing the gradient, Δf(x)=[df/dx(1), ..., df/dx(n)] requires calculating derivatives.
I just wanted to throw this at SO to get some direction on going about programming this, e.g.:
1) What dimensionality would be best to start with (1,2,...)
2) Advice on how to go about doing the partial derivatives
3) Whether I should implement in an easier language, like python, first -- then translate over to C
4) Etc.
Let me know your thoughts! Thanks in advance
1) Start in 2D, this way you can plot the path of the descent and actually see your algorithm working.
2) df/dx = (f(x+h)-f(x-h))/(2*h) if f evaluation is cheap, (f(x+h)-f(x))/h if it is expensive. The choice of h should balance truncation error (mostly with big h) and roundoff error (small h). Typical values of h are ~ pow(DBL_EPSILON, 1./3), but the actual exponent depends on the formula for the derivative, and ideally there should be a prefactor that depends on f. You may plot the numerical derivative as a function of h in a logscale, for some given sample points in the parameter space. You will then clearly see the range of h that is optimal for the points you are sampling.
3) Yes whatever you find easier.
4) The hard point is finding the optimal step size. You may want to use an inner loop here to search for the optimal step.
1) I'd start with a simple 1D example, and then go to 2D once I'm sure that works.
2) As you know the objective function before-hand, maybe you can supply an analytical gradient as well. If possible, that is (almost) always better than resorting to numerical derivatives.
3) By all means.
4) Presumably steepest descent is only the first step, next maybe look into something like CG or BFGS.
I am finding the minimum of f(x)=||Ax-y||^2, with A(n,n) and y(n) given.
This problem is known as least squares, and you are doing unconstrained optimization. Writing a finite-difference gradient descent solver in C is not the right approach at all. First of all you can easily calculate the derivative analytically, so there is no reason to do finite difference. Also, the problem is convex, so it even gets easier.
(Let A' denote the transpose of A)
d/dx ||Ax - y||^2 = 2*A'*(Ax - y)
since this is a covex problem we know the global min will occur when the derivative is 0
0 = 2*A'(Ax - y)
A'y = A'Ax
inverse(A'A)*A'y = x
A'A is guaranteed invertible because it is positive definite, so the problem reduces to calculating this inverse which is O(N^3). That said, there are libraries to do least squares in both C and python, so you should probably just use them instead of writing your own code.
Related
I hesitate in asking this because it's a classic question and I'm sure there are plenty of solutions already but I couldn't find one after a few hours. I'm looking so a simple (*) C code do do peak detection on a 'sharp' peak of X/Y values. X is not necessarily 1,2,3... but is ordered.
(*) I think a Gaussian fit works fine, and indeed I've already implemented and tested it with GNU Scientific Library, but I'm looking for a solution without dependencies for embedded programming, so no GSL.
So in short if I have X={0,1,2,3} Y={0,10,10,0} the solution should be Xm=1.5. I don't care about the interpolated Ym at Xm.
The peak is 'sharp' in the sense that the highest Y (or the 2 highest Y) is many times the value of next/previous Y. That's why simply looking for the max is just not enough.
I'm working on a fairly complicated program here and unfortunately I've painted myself into a bit of a corner.
I have a function (let's call it f(x) for simplicity) that I know the output value of, and I need to find the input value that generates that output value (to within a certain threshold).
Unfortunately the equations behind f(x) are fairly complicated and I don't have all the information I need to simply run them in reverse- so I'm forced to perform some sort of brute force search to find the right input variable instead.
The outputs for f(x) are guaranteed to be ordered, in such a way that f(x - 1) < f(x) < f(x + 1) is always true.
What is the most efficient way to find the value of x? I'm not entirely sure if this is a "root finding" problem- it seems awfully close, but not quite. I figure there's gotta be some official name for this sort of algorithm, but I haven't been able to find anything on Google.
I'm assuming that x is an integer so the result f(x - 1) < f(x) < f(x + 1) means that the function is strictly monotonic.
I'll also assume your function is not pathological, such as
f(x) = x * cos(2 * pi * x)
which satisfies your property but has all sorts of nasties between integer values of x.
A linear bisection algorithm is appropriate and tractable here (and you could adapt it to functions which are badly behaved for non-integral x), Brent might recover the solution faster. Such algorithms may well return you a non-integral value of x, but you can always check the integers either side of that, and return the best one (that will work if the function is monotonic in all real values of x). Furthermore, if you have an analytic first derivative of f(x), then an adaption of Newton Raphson might work well, constraining x to be integral (which might not make much sense, depending on your function; it would be disastrous to apply it to the pathological example above!). Newton Raphson is cute since you only need one starting point, unlike Linear Bisection and Brent which both require the root to be bracketed.
Do Google the terms that I've italicised.
Reference: Brent's Method - Wikipedia
For a general function, I would do the following:
Evaluate at 0 and determine if x is positive or negative.
(Assuming positive) . . . Evaluate powers of 2 until you bound the value (1, 2, 4, 8, . . . )
Once you have bounds then do repeated bisection until you get the precision you are looking for
If this is being called multiple times, I would cache the values along the way to reduce the time needed for subsequent operations.
In Matlab, it's easy to define a vector this way:
x = a:b:c, where a,b,c are real numbers, a < c and b <= c - a.
My problem is that I'm having troubles trying to define a formula to calculate the number of elements in x.
I know that the problem is solved using the size command, but I need a formula because I'm doing a version of a Matlab program (which uses vectors this way), in another language.
Thanks in advance for any help you can provide.
Best regards,
Víctor
On a mathematical level you could argue that all of these expressions return the same:
size(a:b:c)
size(a/b:c/b)
size(0:c/b-a/b)
Now you end up with integers from 0 to that term, which is:
floor((c-a)/b+1)
There is one problem: Floating point precision. The colon operator does repeated summing, don't know any possibility to predict reproduce that.
I am reading the standard (Numerical Recipes and GSL C versions are identical) implementation of Brent root finding algorithm, and cannot understand the meaning of variable "e". The usage suggests that "e" is supposed to be the previous distance between the brackets. But then, why is it set to "xm" (half the distance) when we use bisection?
I'm not familiar with the algorithm. However, I can compare the C source and the Wikipedia description of the algorithm. The algorithm seems straight forward-ish (if you're familiar with methods to find roots), but the C implementation looks like a direct port of the fortran, so it's rather hard to read.
My best guess is that e is related to the loop conditional.
Wikipedia says (line 8 of the algorithm):
repeat until f(b or s) = 0 or |b − a| is small enough (convergence)
The C source says:
e = b - a, then later if (fabs(e) <= tol ....
I'd hope that the purpose of the variables would be described clearly in the book, but apparently not :)
Ok, here you go. I found the original implementation (in algol 60) here. In addition to a nice description of the algorithm, it says (starting on page 50):
let e be the value of p/q at the step before the last one. If |e|< δ or|p/q| ≥ 1/2|e| then do a bisection, otherwise we do either a bisection or interpolation just as in Dekker's algorithm. Thus |e| decreases by at least a factor of two on every second step, and when |e|< δ a bisection must be done. (After a bisection we take e = m for the next step.)
So the addition of e is Brent's "main modification" of Dekker's algorithm.
E is the "epsilon" variable, which is basically a measure of how close is close enough. Your particular application may not require 20 digits of precision, so epsilon lets you balance how many iterations it requires ( i.e., how long it runs ) versus how accurate you need it.
With floating point numbers you may not be able to be exact, so epsilon should be some small non-zero number. The actual value depends on your application... it's basically the largest acceptable error.
During a bisection step, the interval is exactly halved. Thus, e, holding the current width of the interval, is halved as well.
Im a programmer that wants to learn how the Levenberg–Marquardt curvefitting algorithm works so that i can implement it myself. Is there a good tutorial anywhere that can explain how it works in detail with the reader beeing a programmer and not a mathemagician.
My goal is to implement this algorithm in opencl so that i can have it run hardware accelerated.
Minimizing a function is like trying to find lowest point on a surface. Think of yourself walking on a hilly surface and that you are trying to get to the lowest point. You would find the direction that goes downhill and walk until it doesn't go downhill anymore. Then you would chose a new direction that goes downhill and walk in that direction until it doesn't go downhill anymore, and so on. Eventually (hopefully) you would reach a point where no direction goes downhill anymore. You would then be at a (local) minimum.
The LM algorithm, and many other minimization algorithms, use this scheme.
Suppose that the function being minimized is F and we are at the point x(n) in our iteration. We wish to find the next iterate x(n+1) such that F(x(n+1)) < F(x(n)), i.e. the function value is smaller. In order to chose x(n+1) we need two things, a direction from x(n) and a step size (how far to go in that direction). The LM algorithm determines these values as follows -
First, compute a linear approximation to F at the point x(n). It is easy to find out the downhill direction of a linear function, so we use the linear approximating function to determine the downhill direction.
Next, we need to know how far we can go in this chosen direction. If our approximating linear function is a good approximation for F for a large area around x(n), then we can take a fairly large step. If it's a good approximation only very close to x(n), then we can take only a very small step.
This is what LM does - calculates a linear approximation to F at x(n), thus giving the downhill direction, then it figures out how big a step to take based on how well the linear function approximates F at x(n). LM figures out how good the approximating function is by basically taking a step in the direction thus determined and comparing how much the linear approximation to F decreased to the how much the the actual function F decreased. If they are close, the approximating function is good and we can take a little larger step. If they are not close then the approximation function is not good and we should back off and take a smaller step.
Try http://en.wikipedia.org/wiki/Levenberg–Marquardt_algorithm
PDF Tutorial from Ananth Ranganathan
JavaNumerics has a pretty readable implementation
The ICS has a C/C++ implementation
The basic ideas of the LM algorithm can be explained in a few pages - but for a production-grade implementation that is fast and robust, many subtle optimizations are necessary. State of the art is still the Minpack implementation by Moré et al., documented in detail by Moré 1978 (http://link.springer.com/content/pdf/10.1007/BFb0067700.pdf) and in the Minpack user guide (http://www.mcs.anl.gov/~more/ANL8074b.pdf). To study the code, my C translation (https://jugit.fz-juelich.de/mlz/lmfit) is probably more accessible than the original Fortran code.
Try Numerical Recipes (Levenberg-Marquardt is in Section 15.5). It's available online, and I find that they explain algorithms in a way that's detailed (they have complete source code, how much more detailed can you get...), yet accessible.
I used these notes from a course at Purdue University to code up a generic Levenberg-Marquardt curve-fitting algorithm in MATLAB that computes numerical derivatives and therefore accepts any function of the form f(x;p) where p is a vector of fitting parameters.