Shift frequency (using FFT) on integer samples of wav file - c

I want to make a frequency shift on a .wav file.
The problem I have is that the FFT uses complex numbers, and the .wav file has integer values. So I want to make a frequency shift and that means that I have to make a direct transform and an inverse transform , the problem is that the inverse transform doesn't give me integer values (it gives me complex values), but I need integer values for the samples of the .wav file.
How do I interpret the values of the inverse transform ?

I want to make a frequency shift on a .wav file.
So you've got an audio, which means a real-valued, signal.
The spectrum of a real valued signal has symmetry to f=0, i.e. it's Fourier Transform has hermitian symmetry.
If you now shift that input spectrum (blue), the result (red) loses symmetry, i.e. the resulting signal is no longer real:
Notice how things are, through aliasing, circular, so what you "shift out" of the Nyquist range will appear at the opposite end. In my example, this means that you get unexpected high frequency components!
The problem I have is that the FFT uses complex numbers, and the .wav file has integer values. So I want to make a frequency shift and that means that I have to make a direct transform and an inverse transform , the problem is that the inverse transform doesn't give me integer values (it gives me complex values), but I need integer values for the samples of the .wav file.
Indeed! And that's because the result of your shift really isn't a real signal anymore.
What you can do, however, is:
shift (either in time or frequency domain – honestly, doing it in time domain would be easier! Just multiply the nth sample with exp(2j π f_shift/f_sample n ).)
apply a complex bandpass filter that removes everything outside the frequency [0; f_sample / 2 - shift ]. This gives you what is called an analytical signal (i.e. only positive frequencies), which still isn't real-valued, because it's not symmetric.
Throwing away the imaginary part now doesn't alter the information of your signal - it just halves the energy and gives you a symmetric spectrum, and something you can write to a .wav file.
Now, this whole "doing it in frequency domain through the FFT" is an approach that people in the Software Defined Radio world are pretty used to – they deal with complex baseband signals all the time.
How do I interpret the values of the inverse transform ?
As the complex signal that they are. Ignoring the imaginary part as suggested in the comments will lead to the energy contained in the negative frequencies being mirrored onto you positive frequencies (and the other way around), and is most likely not what you want – unless either:
you've made sure that prior to this "symmetricalization", the energy on either side of f=0 was 0, so that nothing bad happens, pr
you've made sure that you selectively shifted the negative and positive frequencies so, that symmetry was retained. Notice that this is not a "simple" shift of the whole frequency domain, but two selective shifts; the selection of these shifting regions has a shape, which boils down to using a window. If you just "select" or "not select" each bin for shifting, you're effectively applying a rectangular window – with all the Gibb's Phenomenom you can incur with that.

Related

How to scale DFT output to 0.0 through 1.0

I'm trying to make a simple music visualization application, I understand I need to take my audio samples and perform a Fast Fourier Transformation. I'm trying to find out how to determine what the scale of the magnitude is, so I can normalize it to be between 0.0 and 1.0 for plotting purposes.
My application is setup to allow reading audio in 16-bit and 24-bit format, so I scale all incoming audio samples to [-1.0,1.0), then I use a real-to-complex 1-dimensional transform for N samples.
From there, I think I need to take the absolute value of each bin (using the cabs function) between 0 and N/2, but I'm not sure what these numbers really represent or what I'm supposed to do with them.
I've figured out how to calculate the frequency of each bin, I'm not interested in finding the actual magnitude or amplitude in decibels, I really just want to get a value between 0.0 and 1.0.
Most explanations for fftw involve a lot of math that is honestly way above my head.
[Per comments, OP seeks to know the maximum possible magnitude of any output bin given inputs in [−1, 1]. This answer gives a way to determine that.]
DFT routines vary in how they handle scaling. Some normalize their output to keep the scale the same, and some let the arithmetic operations grow the scale for better performance or implementation convenience. So the possible scale of the output is not determined solely by mathematics; it depends on the routine used. The documentation of the routine ought to state what scaling it uses.
In the absence of clear documenrtation, you can determine the maximum output by writing a sine wave with amplitude one to the input (and a frequency matching one of the output bins), then performing the transform, and then examining the output to see which bin has the largest magnitude (it should be the one whose frequency you used, of course). It will likely be 1 or N (the number of inputs), with some slop due to floating-point rounding effects.
(When plotting, be sure to allow a little leeway for floating-point rounding effects—the actual numbers could be slightly greater than the maximum, so avoid overflowing or clipping where you do not want that.)

How to generate an integer random value within an interval, starting from random byte

I looked at the other answers I found, but they don't seem suitable to my case.
I am working with ANSI C, on an embedded 32bit ARM system.
I have a register that generates a random 8bit value (generated from thermal noise in the chip). From that value I would like to generate evenly distributed integer values within certain ranges, those are:
0,1,2,3,4,5
0,1,2,3,4,5,6,7,8,9
"true" randomness is very important in my application, I need to generate white noise that could make a measurement drift.
Thanks!
Taking RandomValue % SizeOfRange will not produce a truly random value because in general the bucketing into the discrete possible values will be uneven.
I would suggest using a bit mask to ignore all bits outside the range of interest, then repeatedly getting a new random number until the masked value falls within the desired range.
For the range 0..5, look at the right-most 3 bits. That will produce a value in the range 0..7. "Reroll" results of 6 or 7.
For the range 0..9 look at the right-most 5 bits. The range is 0..16. Ignore results from 10..16.
As a real-word analog, think of trying to get a random number between 1 and 5 with a 6-sided die. There is no "fair" algorithm to map a roll of 6 into one of the desired numbers 1..5. Simply reroll a 6 until you get something in the desired range.
Masking high bits ensures that the number of "rerolls" is minimal.
Be sure to pay attention to any physical limitations on how often you can pull the special register and expect to get an entirely random value.

Why does convolution with kernels work?

I don't understand how someone could come up with a simple 3x3 matrix called kernel, so when applied to the image, it would produce some awesome effect. Examples: http://en.wikipedia.org/wiki/Kernel_(image_processing) . Why does it work? How did people come up with those kernels (trial and error?)? Is it possible to prove it will always work for all images?
I don't understand how someone could come up with a simple 3x3 matrix called kernel, so when applied to the image, it would produce some awesome effect. Examples: http://en.wikipedia.org/wiki/Kernel_(image_processing).
If you want to dig into the history, you'll need to check some other terms. In older textbooks on image processing, what we think of as kernels today are more likely to be called "operators." Another key term is convolution. Both these terms hint at the mathematical basis of kernels.
http://en.wikipedia.org/wiki/Convolution
You can read about mathematical convolution in the textbook Computer Vision by Ballard and Brown. The book dates back to the early 80s, but it's still quite useful, and you can read it for free online:
http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/toc.htm
From the table of contents to the Ballard and Brown book you'll find a link to a PDF for section 2.2.4 Spatial Properties.
http://homepages.inf.ed.ac.uk/rbf/BOOKS/BANDB/LIB/bandb2_2.pdf
In the PDF, scroll down to the section "The Convolution Theorem." This provides the mathematical background for convolution. It's a relatively short step from thinking about convolution expressed as functions and integrals to the application of the same principles to the discrete world of grayscale (or color) data in 2D images.
You will notice that a number of kernels/operators are associated with names: Sobel, Prewitt, Laplacian, Gaussian, and so on. These names help suggest that there's a history--really quite a long history--of mathematical development and image processing research that has lead to the large number of kernels in common use today.
Gauss and Laplace lived long before us, but their mathematical work has trickled down into forms we can use in image processing. They didn't work on kernels for image processing, but mathematical techniques they developed are directly applicable and commonly used in image processing. Other kernels were developed specifically for processing images.
The Prewitt operator (kernel), which is quite similar to the Sobel operator, was published in 1970, if Wikipedia is correct.
http://en.wikipedia.org/wiki/Prewitt_operator
Why does it work?
Read about the mathematical theory of convolution to understand how one function can be "passed over" or "dragged" across another. That can explain the theoretical basis.
Then there's the question of why individual kernels work. In you look at the edge transition from dark to light in an image, and if you plot the pixel brightness on a 2D scatterplot, you'll notice that the values in the Y-axis increase rapidly about the edge transition in the image. That edge transition is a slope. A slope can be found using the first derivative. Tada! A kernel that approximates a first derivative operator will find edges.
If you know there's such a thing in optics as Gaussian blur, then you might wonder how it could be applied to a 2D image. Thus the derivation of the Gaussian kernel.
The Laplacian, for instance, is an operator that, according to the first sentence from the Wikipedia entry, "is a differential operator given by the divergence of the gradient of a function on Euclidean space."
http://en.wikipedia.org/wiki/Laplacian
Hoo boy. It's quite a leap from that definition to a kernel. The following page does a fine job of explaining the relationship between derivatives and kernels, and it's a quick read:
http://www.aishack.in/2011/04/the-sobel-and-laplacian-edge-detectors/
You'll also see that one form of the Laplacian kernel is simply named the "edge-finding" kernel in the Wikipedia entry you cited.
There is more than one edge-finding kernel, and each has its place. The Laplacian, Sobel, Prewitt, Kirsch, and Roberts kernels all yield different results, and are suited for different purposes.
How did people come up with those kernels (trial and error?)?
Kernels were developed by different people following a variety of research paths.
Some kernels (to my memory) were developed specifically to model the process of "early vision." Early vision isn't what happens only to early humans, or only for people who rise at 4 a.m., but instead refers to the low-level processes of biological vision: sensing of basic color, intensity, edges, and that sort of thing. At the very low level, edge detection in biological vision can be modeled with kernels.
Other kernels, such as the Laplacian and Gaussian, are approximations of mathematical functions. With a little effort you can derive the kernels yourself.
Image editing and image processing software packages will often allow you to define your own kernel. For example, if you want to identify a shape in an image small enough to be defined by a few connected pixels, then you can define a kernel that matches the shape of the image feature you want to detect. Using custom kernels to detect objects is too crude to work in most real-world applications, but sometimes there are reasons to create a special kernel for a very specific purpose, and sometimes a little trial and error is necessary to find a good kernel.
As user templatetypedef pointed out, you can think of kernels intuitively, and in a fairly short time develop a feel for what each would do.
Is it possible to prove it will always work for all images?
Functionally, you can throw a 3x3, 5x5, or NxN kernel at an image of the appropriate size and it'll "work" in the sense that the operation will be performed and there will be some result. But then the ability to compute a result whether it's useful or not isn't a great definition of "works."
One information definition of whether a kernel "works" is whether convolving an image with that kernel produces a result that you find useful. If you're manipulating images in Photoshop or GIMP, and if you find that a particular enhancement kernel doesn't yield quite what you want, then you might say that kernel doesn't work in the context of your particular image and the end result you want. In image processing for computer vision there's a similar problem: we must pick one or more kernels and other (often non-kernel based) algorithms that will operate in sequence to do something useful such as identify faces, measures the velocity of cars, or guide robots in assembly tasks.
Homework
If you want to understand how you can translate a mathematical concept into a kernel, it helps to derive a kernel by yourself. Even if you know what the end result of the derivation should be, to grok the notion of kernels and convolution it helps to derive a kernel from a mathematical function by yourself, on paper, and (preferably) from memory.
Try deriving the 3x3 Gaussian kernel from the mathematical function.
http://en.wikipedia.org/wiki/Gaussian_function
Deriving the kernel yourself, or at least finding an online tutorial and reading closely, will be quite revealing. If you'd rather not do the work, then you may not appreciate the way that some mathematical expression "translates" to a bunch of numbers in a 3x3 matrix. But that's okay! If you get the general sense of a common kernel is useful, and if you observe how two similar kernels produce slightly different results, then you'll develop a good feel for them.
Intuitively, a convolution of an image I with a kernel K produces a new image that's formed by computing a weighted sum, for each pixel, of all the nearby pixels weighted by the weights in K. Even if you didn't know what a convolution was, this idea still seems pretty reasonable. You can use it to do a blur effect (by using a Gaussian weighting of nearby pixels) or to sharpen edges (by subtracting each pixel from its neighbors and putting no weight anywhere else.) In fact, if you knew you needed to do all these operations, it would make sense to try to write a function that given I and K did the weighted sum of nearby pixels, and to try to optimize that function as aggressively as possible (since you'd probably use it a lot).
To get from there to the idea of a convolution, you'd probably need to have a background in Fourier transforms and Fourier series. Convolutions are a totally natural idea in that domain - if you compute the Fourier transformation of two images and multiply the transforms together, you end up computing the transform of the convolution. Mathematicians had worked that out a while back, probably by answering the very natural question "what function has a Fourier transform defined by the product of two other Fourier transforms?," and from there it was just a matter of time before the connection was found. Since Fourier transforms are already used extensively in computing (for example, in signal processing in networks), my guess is that someone with a background in Fourier series noticed that they needed to apply a kernel K to an image I, then recognized that this is way easier and more computationally efficient when done in frequency space.
I honestly have no idea what the real history is, but this is a pretty plausible explanation.
Hope this helps!
There is a good deal of mathematical theory about convolutions, but the kernel examples you link to are simple to explain intuitively:
[ 0 0 0]
[ 0 1 0]
[ 0 0 0]
This one says to take the original pixel and nothing else, so it yields just the original image.
[-1 -1 -1]
[-1 8 -1]
[-1 -1 -1]
This one says to subtract the eight neighbors from eight times the original pixel. First consider what happens in a smooth part of the image, where there is solid, unchanging color. Eight times the original pixel equals the sum of eight identical neighbors, so the difference is zero. Thus, smooth parts of the image become black. However, parts of the images where there are changes do not become black. Thus, this kernel highlights changes, so it highlights places where one shape ends and another begins: the edges of objects in the image.
[ 0 1 0]
[ 1 -4 1]
[ 0 1 0]
This is similar to the one above, but it is tuned differently.
[ 0 -1 0]
[-1 5 -1]
[0 -1 0]
Observe that this is just the negation of the edge detector above plus the first filter we saw, the one for the original image. So this kernel both highlights edges and adds that to the original image. The result is the original image with more visible edges: a sharpening effect.
[ 1 2 1]
[ 2 4 2]
[ 1 2 1]
[ 1 1 1]
[ 1 1 1]
[ 1 1 1]
Both of these blend the original pixel with its neighbors. So they blur the image a little.
There are two ways of thinking about (or encoding) an image: the spatial domain and the frequency domain. A spatial representation is based on pixels, so it's more familiar and easier to obtain. Both the image and the kernel are expressed in the spatial domain.
To get to the frequency domain, you need to use a Fourier or related transform, which is computationally expensive. Once you're there, though, many interesting manipulations are simpler. To blur an image, you can just chop off some high-frequency parts — like cropping the image in the spatial domain. Sharpening is the opposite, akin to increasing the contrast of high-frequency information.
Most of the information of an image is in the high frequencies, which represent detail. Most interesting detail information is at a small, local scale. You can do a lot by looking at neighboring pixels. Blurring is basically taking a weighted average of neighboring pixels. Sharpening consists of looking at the difference between a pixel and its neighbors and enhancing the contrast.
A kernel is usually produced by taking a frequency-domain transformation, then keeping only the high-frequency part and expressing it in the spatial domain. This can only be done for certain transformation algorithms. You can compute the ideal kernel for blurring, sharpening, selecting certain kinds of lines, etc., and it will work intuitively but otherwise seems like magic because we don't really have a "pixel arithmetic."
Once you have a kernel, of course, there's no need to get into the frequency domain at all. That hard work is finished, conceptually and computationally. Convolution is pretty friendly to all involved, and you can seldom simplify any further. Of course, smaller kernels are friendlier. Sometimes a large kernel can be expressed as a convolution of small sub-kernels, which is a kind of factoring in both the math and software senses.
The mathematical process is pretty straightforward and has been studied since long before there were computers. Most common manipulations can be done mechanically on an optical bench using 18th century equipment.
I think the best way to explain them is to start in 1d and discuss the z-transform and its inverse. That switches from the time domain to the frequency domain — from describing a wave as a timed sequence of samples to describing it as the amplitude of each frequency that contributes to it. The two representations contain the same amount of information, they just express it differently.
Now suppose you had a wave described in the frequency domain and you wanted to apply a filter to it. You might want to remove high frequencies. That would be a blur. You might want to remove low frequencies. That would be a sharpen or, in extremis, an edge detect.
You could do that by just forcing the frequencies you don't want to 0 — e.g. by multiplying the entire range by a particular mask, where 1 is a frequency you want to keep and 0 is a frequency you want to eliminate.
But what if you want to do that in the time domain? You could transfer to the frequency domain, apply the mask, then transform back. But that's a lot of work. So what you do (approximately) is transform the mask from the frequency domain to the time domain. You can then apply it in the time domain.
Following the maths involved for transforming back and forth, in theory to apply that you'd have to make each output sample a weighted sum of every single input sample. In the real world you make a trade-off. You use the sum of, say, 9 samples. That gives you a smaller latency and less processing cost than using, say, 99 samples. But it also gives you a less accurate filter.
A graphics kernel is the 2d analogue of that line of thought. They tend to be small because processing cost grows with the square of the edge length so it gets expensive very quickly. But you can approximate any sort of frequency domain limiting filter.

Factorial Using FFT

I'm trying to implement a program in C that calculates the factorial of a very large n (up to a million), using fft and binary splitting method.
I've implemented a simple library to represent arbitrary precision integer.
To calculate the fft and ifft, i use twofft.c and four1.c routines from "Numerical Recipes in C"
Up to a certain n, all goes right, but when the numbers (floating arrays) are too big, the ifft (calculate with four1),after normalization and rounding, has values that are wrong.
For example, if i have two number with 2000 digits that ends with 40 zeros, and i have to multiply them each other (using fft), when i calculate the ifft, some ending zeros become "one".
this happens because when i rounded one of this "zeros", (0,50009 for examples), they became "one".
Now, i don't know if is my implementation wrong or if i have to rounding this numebrs in a different way.
I've tried to use both binary split method and prime factorization, but for n >= 9000, the result is wrong.
there is a way to resolve this?
thanks for your attention and sorry for my bad english.
How do you represent arbitrary precision integers?
I mean what type are you actually using?
Can you please show us your code?
If you feel really lazy you can clone this project i've made few months ago:
https://github.com/nomadster/ESP
Edit:
By further reading your post i suppose by this statement
"this happens because when i rounded one of this "zeros", (0,50009 for examples), they became "one""
that you are still unaware of the fact that fft multiplication only works when the roundoff error is smaller than 0.5.
So it seems to me (if and only if i've correctly interpreted your cryptic message) that you are using a floating point type that doesn't have the required precision.
For the record:
I also noticed wrong values returned by ifft from four1.c from numerical recipes. I only tested it with N=256 complex values as input, assembled in a way, that they should result in a real only time domain signal.
The resulting time domain vector has to be mirrored (end to start and vice versa ...) and shifted by one to correspond with the IFFTs of other implementations. (I tested numpy.fft.ifft, octave's ifft and a inverse discrete fourier transformation without any optimisation, simply based on the IDFT formula, which should be definitly correct).
There has to be a fundamental algorithm fault in the version provided by numerical recipies. In their books nothing related to this problem is described.

How to generate Gaussian Channel in C?

I need to simulate a Gaussian Channel in C.
How do I do that?
Where can I get code snippets for this?
IIRC, approximating Gaussian distribution is easy - but slow if you want a good approximation. Just add several independent random numbers to get each output. The more "inputs" per output, the better the approximation.
Definitely works if the "inputs" have uniform distribution. I seem to remember reading that it works for almost any input distribution, but you may need far more inputs per output to get a good approximation.
This is Gaussian white noise - the outputs are independent (all frequencies have same amplitude). There's also a similar pink noise algorithm. Still Gaussian distribution, but higher frequencies have lower amplitudes (the outputs aren't independent). Each ouput is still a sum of a fixed set of independent "input" random numbers, but only first is replaced for every output. The second is replaced for every other output, the third for every fourth output, the fourth for every eighth output etc. For most outputs, precisely two input random numbers are replaced - every 2^n outputs you only replace the first.

Resources