Related
I am trying to make a kaiser window for a audio signal using both Matlab and c.I have been looking at Matlab and gnu scientific library documentation to understand how to use a modified bessel function of first kind and 0th order, but I still have some questions:
It seems that GSL does not accept a 0 order bessel function, I don't understand the documentation on this point.
I don't know if I should use a regular or irregular function. What are their differences? Matlab do not have that.
which is the fastest method to filter the signal: time domain or frequency domain?
how to filter the signal on the frequency domain?
I will only answer to the last three points. (Warning : I am french and my english isn't great...)
1) When you consider the Fourier transform of a signal multiplied by a specific window, in the spectral domain, you convolute the original spectrum of the signal by the spectrum of your window. In an ideal mathematical world, you would love to have a Dirac since it's convolution would only shift the signal. But to get a Dirac in the frequency you would need a periodic signal in the time domain which isn't defined on a compact (i.e. finite like your sound record) support. And this is too bad because there is a theorem (Paley-Wiener's corollary) that states that if your time-domain support is compact your frequency-domain support is not bounded and the decreasing behaviour of the Fourier transform increase with the regularity of the signal (i.e. window in our case). Great then ! All we have to chose is a nice regular (smooth ?) window. Unfortunately, to get a really smooth window, we have to narrow it (wide smooth windows exist but have other drawbacks dues to their derived function...its like too large constants ahead appealing algorithmic complexity) and it's spectrum will be wider (for the same reason invoked in the theorem). But you (and Obama) believe in compromise to face the (Pontryagin) duality, don't you ? The gaussian is a great compromise since its Fourier transform is a gaussian too (sum of random variables ? convolution ? +,x-morphism in the complex plane...every thing is linked but its a too long non-linear story to be told here). Therefore a lot of window tend to look like a gaussian.
Here is a bunch of windows and spectrums stolen to my speech processing teacher :
2) It's a pure mathematical duality, so it depends of what you mean by fiter. Does applying a Sobel filter into the frequency-domain make any sense ? (in fact it may...)
3) Again, it depends of what you mean by filter.
I think I can answer (1) and (2):
(1) You can program the zeroth order Bessel functions yourself (spherical or not) by consulting a handbook on mathematical functions such as Abramowitz and Stegun, Gradshteyn and Ryzhik, or the Digital Library of Mathematical Functions (http://dlmf.nist.gov/).
(2) By regular and irregular, I presume you mean regular or modified Bessel functions. Bessel functions are solutions to the three-dimensional heat equation posed in cylindrical coordinates. Your boundary conditions determine your use of regular or modified Bessel functions. For nice discussions about regular and modified Bessel functions, I suggest reading The Conduction of Heat in Solids by Carslaw and Jaeger and Boundary Value Problems of Heat Conduction by M. Necati Ozisik. You can also try for difficult problems Classical Electrodynamics by John David Jackson.
How are the regular and modified Bessel functions different? The regular J Bessel function is somewhat oscillatory in nature (see a nice old book like Jahnke, Emde, and Losch for hand-drawn Bessel function graphs, a lost art form if you ask me) whereas I and K are single-valued.
I can't really help you much on (3) and (4), as I'm not much an electrical engineer (although I would like to learn more!).
I don't understand how someone could come up with a simple 3x3 matrix called kernel, so when applied to the image, it would produce some awesome effect. Examples: http://en.wikipedia.org/wiki/Kernel_(image_processing) . Why does it work? How did people come up with those kernels (trial and error?)? Is it possible to prove it will always work for all images?
I don't understand how someone could come up with a simple 3x3 matrix called kernel, so when applied to the image, it would produce some awesome effect. Examples: http://en.wikipedia.org/wiki/Kernel_(image_processing).
If you want to dig into the history, you'll need to check some other terms. In older textbooks on image processing, what we think of as kernels today are more likely to be called "operators." Another key term is convolution. Both these terms hint at the mathematical basis of kernels.
http://en.wikipedia.org/wiki/Convolution
You can read about mathematical convolution in the textbook Computer Vision by Ballard and Brown. The book dates back to the early 80s, but it's still quite useful, and you can read it for free online:
http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/toc.htm
From the table of contents to the Ballard and Brown book you'll find a link to a PDF for section 2.2.4 Spatial Properties.
http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/LIB/bandb2_2.pdf
In the PDF, scroll down to the section "The Convolution Theorem." This provides the mathematical background for convolution. It's a relatively short step from thinking about convolution expressed as functions and integrals to the application of the same principles to the discrete world of grayscale (or color) data in 2D images.
You will notice that a number of kernels/operators are associated with names: Sobel, Prewitt, Laplacian, Gaussian, and so on. These names help suggest that there's a history--really quite a long history--of mathematical development and image processing research that has lead to the large number of kernels in common use today.
Gauss and Laplace lived long before us, but their mathematical work has trickled down into forms we can use in image processing. They didn't work on kernels for image processing, but mathematical techniques they developed are directly applicable and commonly used in image processing. Other kernels were developed specifically for processing images.
The Prewitt operator (kernel), which is quite similar to the Sobel operator, was published in 1970, if Wikipedia is correct.
http://en.wikipedia.org/wiki/Prewitt_operator
Why does it work?
Read about the mathematical theory of convolution to understand how one function can be "passed over" or "dragged" across another. That can explain the theoretical basis.
Then there's the question of why individual kernels work. In you look at the edge transition from dark to light in an image, and if you plot the pixel brightness on a 2D scatterplot, you'll notice that the values in the Y-axis increase rapidly about the edge transition in the image. That edge transition is a slope. A slope can be found using the first derivative. Tada! A kernel that approximates a first derivative operator will find edges.
If you know there's such a thing in optics as Gaussian blur, then you might wonder how it could be applied to a 2D image. Thus the derivation of the Gaussian kernel.
The Laplacian, for instance, is an operator that, according to the first sentence from the Wikipedia entry, "is a differential operator given by the divergence of the gradient of a function on Euclidean space."
http://en.wikipedia.org/wiki/Laplacian
Hoo boy. It's quite a leap from that definition to a kernel. The following page does a fine job of explaining the relationship between derivatives and kernels, and it's a quick read:
http://www.aishack.in/2011/04/the-sobel-and-laplacian-edge-detectors/
You'll also see that one form of the Laplacian kernel is simply named the "edge-finding" kernel in the Wikipedia entry you cited.
There is more than one edge-finding kernel, and each has its place. The Laplacian, Sobel, Prewitt, Kirsch, and Roberts kernels all yield different results, and are suited for different purposes.
How did people come up with those kernels (trial and error?)?
Kernels were developed by different people following a variety of research paths.
Some kernels (to my memory) were developed specifically to model the process of "early vision." Early vision isn't what happens only to early humans, or only for people who rise at 4 a.m., but instead refers to the low-level processes of biological vision: sensing of basic color, intensity, edges, and that sort of thing. At the very low level, edge detection in biological vision can be modeled with kernels.
Other kernels, such as the Laplacian and Gaussian, are approximations of mathematical functions. With a little effort you can derive the kernels yourself.
Image editing and image processing software packages will often allow you to define your own kernel. For example, if you want to identify a shape in an image small enough to be defined by a few connected pixels, then you can define a kernel that matches the shape of the image feature you want to detect. Using custom kernels to detect objects is too crude to work in most real-world applications, but sometimes there are reasons to create a special kernel for a very specific purpose, and sometimes a little trial and error is necessary to find a good kernel.
As user templatetypedef pointed out, you can think of kernels intuitively, and in a fairly short time develop a feel for what each would do.
Is it possible to prove it will always work for all images?
Functionally, you can throw a 3x3, 5x5, or NxN kernel at an image of the appropriate size and it'll "work" in the sense that the operation will be performed and there will be some result. But then the ability to compute a result whether it's useful or not isn't a great definition of "works."
One information definition of whether a kernel "works" is whether convolving an image with that kernel produces a result that you find useful. If you're manipulating images in Photoshop or GIMP, and if you find that a particular enhancement kernel doesn't yield quite what you want, then you might say that kernel doesn't work in the context of your particular image and the end result you want. In image processing for computer vision there's a similar problem: we must pick one or more kernels and other (often non-kernel based) algorithms that will operate in sequence to do something useful such as identify faces, measures the velocity of cars, or guide robots in assembly tasks.
Homework
If you want to understand how you can translate a mathematical concept into a kernel, it helps to derive a kernel by yourself. Even if you know what the end result of the derivation should be, to grok the notion of kernels and convolution it helps to derive a kernel from a mathematical function by yourself, on paper, and (preferably) from memory.
Try deriving the 3x3 Gaussian kernel from the mathematical function.
http://en.wikipedia.org/wiki/Gaussian_function
Deriving the kernel yourself, or at least finding an online tutorial and reading closely, will be quite revealing. If you'd rather not do the work, then you may not appreciate the way that some mathematical expression "translates" to a bunch of numbers in a 3x3 matrix. But that's okay! If you get the general sense of a common kernel is useful, and if you observe how two similar kernels produce slightly different results, then you'll develop a good feel for them.
Intuitively, a convolution of an image I with a kernel K produces a new image that's formed by computing a weighted sum, for each pixel, of all the nearby pixels weighted by the weights in K. Even if you didn't know what a convolution was, this idea still seems pretty reasonable. You can use it to do a blur effect (by using a Gaussian weighting of nearby pixels) or to sharpen edges (by subtracting each pixel from its neighbors and putting no weight anywhere else.) In fact, if you knew you needed to do all these operations, it would make sense to try to write a function that given I and K did the weighted sum of nearby pixels, and to try to optimize that function as aggressively as possible (since you'd probably use it a lot).
To get from there to the idea of a convolution, you'd probably need to have a background in Fourier transforms and Fourier series. Convolutions are a totally natural idea in that domain - if you compute the Fourier transformation of two images and multiply the transforms together, you end up computing the transform of the convolution. Mathematicians had worked that out a while back, probably by answering the very natural question "what function has a Fourier transform defined by the product of two other Fourier transforms?," and from there it was just a matter of time before the connection was found. Since Fourier transforms are already used extensively in computing (for example, in signal processing in networks), my guess is that someone with a background in Fourier series noticed that they needed to apply a kernel K to an image I, then recognized that this is way easier and more computationally efficient when done in frequency space.
I honestly have no idea what the real history is, but this is a pretty plausible explanation.
Hope this helps!
There is a good deal of mathematical theory about convolutions, but the kernel examples you link to are simple to explain intuitively:
[ 0 0 0]
[ 0 1 0]
[ 0 0 0]
This one says to take the original pixel and nothing else, so it yields just the original image.
[-1 -1 -1]
[-1 8 -1]
[-1 -1 -1]
This one says to subtract the eight neighbors from eight times the original pixel. First consider what happens in a smooth part of the image, where there is solid, unchanging color. Eight times the original pixel equals the sum of eight identical neighbors, so the difference is zero. Thus, smooth parts of the image become black. However, parts of the images where there are changes do not become black. Thus, this kernel highlights changes, so it highlights places where one shape ends and another begins: the edges of objects in the image.
[ 0 1 0]
[ 1 -4 1]
[ 0 1 0]
This is similar to the one above, but it is tuned differently.
[ 0 -1 0]
[-1 5 -1]
[0 -1 0]
Observe that this is just the negation of the edge detector above plus the first filter we saw, the one for the original image. So this kernel both highlights edges and adds that to the original image. The result is the original image with more visible edges: a sharpening effect.
[ 1 2 1]
[ 2 4 2]
[ 1 2 1]
[ 1 1 1]
[ 1 1 1]
[ 1 1 1]
Both of these blend the original pixel with its neighbors. So they blur the image a little.
There are two ways of thinking about (or encoding) an image: the spatial domain and the frequency domain. A spatial representation is based on pixels, so it's more familiar and easier to obtain. Both the image and the kernel are expressed in the spatial domain.
To get to the frequency domain, you need to use a Fourier or related transform, which is computationally expensive. Once you're there, though, many interesting manipulations are simpler. To blur an image, you can just chop off some high-frequency parts — like cropping the image in the spatial domain. Sharpening is the opposite, akin to increasing the contrast of high-frequency information.
Most of the information of an image is in the high frequencies, which represent detail. Most interesting detail information is at a small, local scale. You can do a lot by looking at neighboring pixels. Blurring is basically taking a weighted average of neighboring pixels. Sharpening consists of looking at the difference between a pixel and its neighbors and enhancing the contrast.
A kernel is usually produced by taking a frequency-domain transformation, then keeping only the high-frequency part and expressing it in the spatial domain. This can only be done for certain transformation algorithms. You can compute the ideal kernel for blurring, sharpening, selecting certain kinds of lines, etc., and it will work intuitively but otherwise seems like magic because we don't really have a "pixel arithmetic."
Once you have a kernel, of course, there's no need to get into the frequency domain at all. That hard work is finished, conceptually and computationally. Convolution is pretty friendly to all involved, and you can seldom simplify any further. Of course, smaller kernels are friendlier. Sometimes a large kernel can be expressed as a convolution of small sub-kernels, which is a kind of factoring in both the math and software senses.
The mathematical process is pretty straightforward and has been studied since long before there were computers. Most common manipulations can be done mechanically on an optical bench using 18th century equipment.
I think the best way to explain them is to start in 1d and discuss the z-transform and its inverse. That switches from the time domain to the frequency domain — from describing a wave as a timed sequence of samples to describing it as the amplitude of each frequency that contributes to it. The two representations contain the same amount of information, they just express it differently.
Now suppose you had a wave described in the frequency domain and you wanted to apply a filter to it. You might want to remove high frequencies. That would be a blur. You might want to remove low frequencies. That would be a sharpen or, in extremis, an edge detect.
You could do that by just forcing the frequencies you don't want to 0 — e.g. by multiplying the entire range by a particular mask, where 1 is a frequency you want to keep and 0 is a frequency you want to eliminate.
But what if you want to do that in the time domain? You could transfer to the frequency domain, apply the mask, then transform back. But that's a lot of work. So what you do (approximately) is transform the mask from the frequency domain to the time domain. You can then apply it in the time domain.
Following the maths involved for transforming back and forth, in theory to apply that you'd have to make each output sample a weighted sum of every single input sample. In the real world you make a trade-off. You use the sum of, say, 9 samples. That gives you a smaller latency and less processing cost than using, say, 99 samples. But it also gives you a less accurate filter.
A graphics kernel is the 2d analogue of that line of thought. They tend to be small because processing cost grows with the square of the edge length so it gets expensive very quickly. But you can approximate any sort of frequency domain limiting filter.
Recently I asked this question: How to get the fundamental frequency from FFT? (you don't actually need to read it)
My doubt right now it: how to use the cepstral algorithm?
I just don't know how to use it because the only language that I know is ActionScript 3, and for this reason I have few references about the native functions found in C, Java and so on, and how I should implement them on AS. Most articles are about these languages =/
(althought, answers in other languages than AS are welcome, just explain how the script works please)
The articles I found about cepstral to find the fundamental frequency of a FFT result told me that I should do this:
signal → FT → abs() → square → log → FT → abs() → square → power cepstrum
mathematically:
|F{log(|F{f(t)}|²)}|²
Important info:
I am developing a GUITAR TUNER in flash
This is the first time I am dealing with advanced sound
I am using an FFT to extract frequency bins from the signal that reaches user's microphone, but I got stuck in getting the fundamental frequency from it
I don't know:
How to apply a square in an ARRAY (I mean, the data that my FFT gives me is an array. Should I multiply it by itself? ActionScript's debug throws errors when I try to fftResults * fftResults)
How to apply the "log". I would not know how to apply it even if I had a single number.
What is the difference between complex cepstral and power cepstral. Also, what of them should I use? I am trying to develop a guitar tuner.
Thanks!
Note that the output of an FFT is an array of complex values, i.e. each bin = re + j*im. I think you can just combine the abs and square operations and calculate re*re + im*im for each bin. This gives you a single positive value for each bin, and obviously you can calculate the log value for each bin quite easily. You then need to do a second FFT on this log squared data and again using the output of this second FFT you will calculate re*re + im*im for each bin. You will then have an array of postive values which will have one or more peaks representing the fundamental frequency or frequencies of your input.
The autocorrelation is the easiest and most logical approach, and the best place to start.
To get this working, start with a simple autocorrelation, and then, if necessary, improve it following the outline provided by YIN. (YIN is based on the autocorrelation with refinements. But whether or not you'll need these refinements depends on details of your situation.) This way also, you can learn as you go rather than trying to understand the whole thing in one shot.
Although FFT approaches can also work, they are a bit more confusing. The issue is that what you are really after is the period, and this isn't well represented by the FFT. The missing fundamental is a good example of this, where if you have 2Hz and 3Hz, the fundamental is 1Hz, but is nowhere in the FFT, while 1Hz is obvious in a time based representation (e.g. the autocorrelation). Add to this that overtones aren't necessarily harmonic, and noise, etc... and all of these issues make it usually best to start with a direct approach to the problem.
There are many ways of finding fundamental frequency (F0).
For languages like Java etc there are many libraries with those type of algorithms already implemented (you can study their sources).
MFCC (based on cepstral) implemented in Comirva (Open source).
Audacity (beta version!) (Open source) presents cepstrum, autocorellation, enhanced autocorellation,
Yin based on autocorrelation (example )
Finding max signal values after FFT
All these algorithms may be be very helpful for you. However easiest way to get F0 (one value in Hz) would be to use Yin.
I'm looking for some advice on how to go about implementing Gradient (steepest) Descent in C. I am finding the minimum of f(x)=||Ax-y||^2, with A(n,n) and y(n) given.
This is difficult in C (I think) because computing the gradient, Δf(x)=[df/dx(1), ..., df/dx(n)] requires calculating derivatives.
I just wanted to throw this at SO to get some direction on going about programming this, e.g.:
1) What dimensionality would be best to start with (1,2,...)
2) Advice on how to go about doing the partial derivatives
3) Whether I should implement in an easier language, like python, first -- then translate over to C
4) Etc.
Let me know your thoughts! Thanks in advance
1) Start in 2D, this way you can plot the path of the descent and actually see your algorithm working.
2) df/dx = (f(x+h)-f(x-h))/(2*h) if f evaluation is cheap, (f(x+h)-f(x))/h if it is expensive. The choice of h should balance truncation error (mostly with big h) and roundoff error (small h). Typical values of h are ~ pow(DBL_EPSILON, 1./3), but the actual exponent depends on the formula for the derivative, and ideally there should be a prefactor that depends on f. You may plot the numerical derivative as a function of h in a logscale, for some given sample points in the parameter space. You will then clearly see the range of h that is optimal for the points you are sampling.
3) Yes whatever you find easier.
4) The hard point is finding the optimal step size. You may want to use an inner loop here to search for the optimal step.
1) I'd start with a simple 1D example, and then go to 2D once I'm sure that works.
2) As you know the objective function before-hand, maybe you can supply an analytical gradient as well. If possible, that is (almost) always better than resorting to numerical derivatives.
3) By all means.
4) Presumably steepest descent is only the first step, next maybe look into something like CG or BFGS.
I am finding the minimum of f(x)=||Ax-y||^2, with A(n,n) and y(n) given.
This problem is known as least squares, and you are doing unconstrained optimization. Writing a finite-difference gradient descent solver in C is not the right approach at all. First of all you can easily calculate the derivative analytically, so there is no reason to do finite difference. Also, the problem is convex, so it even gets easier.
(Let A' denote the transpose of A)
d/dx ||Ax - y||^2 = 2*A'*(Ax - y)
since this is a covex problem we know the global min will occur when the derivative is 0
0 = 2*A'(Ax - y)
A'y = A'Ax
inverse(A'A)*A'y = x
A'A is guaranteed invertible because it is positive definite, so the problem reduces to calculating this inverse which is O(N^3). That said, there are libraries to do least squares in both C and python, so you should probably just use them instead of writing your own code.
Im a programmer that wants to learn how the Levenberg–Marquardt curvefitting algorithm works so that i can implement it myself. Is there a good tutorial anywhere that can explain how it works in detail with the reader beeing a programmer and not a mathemagician.
My goal is to implement this algorithm in opencl so that i can have it run hardware accelerated.
Minimizing a function is like trying to find lowest point on a surface. Think of yourself walking on a hilly surface and that you are trying to get to the lowest point. You would find the direction that goes downhill and walk until it doesn't go downhill anymore. Then you would chose a new direction that goes downhill and walk in that direction until it doesn't go downhill anymore, and so on. Eventually (hopefully) you would reach a point where no direction goes downhill anymore. You would then be at a (local) minimum.
The LM algorithm, and many other minimization algorithms, use this scheme.
Suppose that the function being minimized is F and we are at the point x(n) in our iteration. We wish to find the next iterate x(n+1) such that F(x(n+1)) < F(x(n)), i.e. the function value is smaller. In order to chose x(n+1) we need two things, a direction from x(n) and a step size (how far to go in that direction). The LM algorithm determines these values as follows -
First, compute a linear approximation to F at the point x(n). It is easy to find out the downhill direction of a linear function, so we use the linear approximating function to determine the downhill direction.
Next, we need to know how far we can go in this chosen direction. If our approximating linear function is a good approximation for F for a large area around x(n), then we can take a fairly large step. If it's a good approximation only very close to x(n), then we can take only a very small step.
This is what LM does - calculates a linear approximation to F at x(n), thus giving the downhill direction, then it figures out how big a step to take based on how well the linear function approximates F at x(n). LM figures out how good the approximating function is by basically taking a step in the direction thus determined and comparing how much the linear approximation to F decreased to the how much the the actual function F decreased. If they are close, the approximating function is good and we can take a little larger step. If they are not close then the approximation function is not good and we should back off and take a smaller step.
Try http://en.wikipedia.org/wiki/Levenberg–Marquardt_algorithm
PDF Tutorial from Ananth Ranganathan
JavaNumerics has a pretty readable implementation
The ICS has a C/C++ implementation
The basic ideas of the LM algorithm can be explained in a few pages - but for a production-grade implementation that is fast and robust, many subtle optimizations are necessary. State of the art is still the Minpack implementation by Moré et al., documented in detail by Moré 1978 (http://link.springer.com/content/pdf/10.1007/BFb0067700.pdf) and in the Minpack user guide (http://www.mcs.anl.gov/~more/ANL8074b.pdf). To study the code, my C translation (https://jugit.fz-juelich.de/mlz/lmfit) is probably more accessible than the original Fortran code.
Try Numerical Recipes (Levenberg-Marquardt is in Section 15.5). It's available online, and I find that they explain algorithms in a way that's detailed (they have complete source code, how much more detailed can you get...), yet accessible.
I used these notes from a course at Purdue University to code up a generic Levenberg-Marquardt curve-fitting algorithm in MATLAB that computes numerical derivatives and therefore accepts any function of the form f(x;p) where p is a vector of fitting parameters.