I'm trying to tilt an image 90 degrees, currently I'm following the method presented here:
http://www.scribd.com/doc/66589491/Image-Rotation-Using-CUDA
My current code for the kernel is this:
__global__ void kernCuda(float *Source,float * Destination, int width)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
int i = abs(x*cosf(theta)-y*sinf(theta));
int j = abs(x*sinf(theta)+y*cosf(theta));
if(x<width && y<width){
Destination[j*width+i]=Source[y*width+x];
}
}
The image tilts somewhat, however it seems like it is not correct, on top of that some pixels that are colored is now black(0). Any help would be appreciated
I assume that theta is a floating point number. So you are mixing floating point and integer variables. I would suggest you use the appropriate casts to make it work and look for rounding issues.
Secondly, in order to see whether your program works in general you can replace cosf(theta) by 0 and sinf(theta) by 1.
One third issue that I can see is that you only take one x,y value instead of looping over them by using a while loop. So in case your image dimension is larger than the kernel that you use, you will not get all your pixels.
Edit: I just had a brief look at the report. It really is not very good. If you want to just learn about CUDA I suggest you get the book called "CUDA by Example".
The obvious problem is the integer truncation/interpolation issue.
One way around that would be to bind the source image to a texture and then read from the texture using the computed real valued coordinates. CUDA textures give you "free", hardware based interpolation/filtering. The main disadvantage of textures is the interpolation is limited to 8 bit internal accuracy, so it might not be accurate enough in some situations. But as a first attempt, try something like:
__global__ void kernCuda(float * Destination, const float sintheta, const float costheta,
const int width)
{
int x = blockIdx.x * blockDim.x + threadIdx.x;
int y = blockIdx.y * blockDim.y + threadIdx.y;
float tx = float(x)*costheta-float(y)*sintheta;
float ty = float(x)*sintheta+float(y)*costheta;
if(x<width && y<width){
Destination[x*width+y]=tex2D(Source_texture, tx+0.5f,ty+0.5f);
}
}
(note not tested, never been near a compiler, use at your own risk).
Here Source_texture is the texture you bind the source data to. You can set the edge behaviour of the rotation in the texture setup, depending on how you want to handle it. How to setup and bind a texture in Section 3.2.10.1 of the CUDA 4.1 programming guide.
Note also that the cosine and sine in the rotation matrix are constant for a given value of theta, so it would be much more efficient to pass them to the kernel as an argument, rather than have every thread compute the same values. For the 90 degree rotation you have asked about, simply call the kernel with sintheta=0.f and costheta=1.f and you are done.
If you simply want to tilt the image by 90 degrees, then you won't need any trigonometric functions, since you can simply swap the x and y axis while interating through the pixels:
Destination[x+y*width]=tex2D(Source_texture, tx+0.5f,ty+0.5f);
should do it.
Related
I want to develop a simple geo-fencing algorithm in C, that works without using sin, cos and tan. I am working with a small microcontroller, hence the restriction. I have no space left for <math.h>. The radius will be around 20..100m. I am not expecting super accurate results this way.
My current solution takes two coordinate sets (decimal, .00001 accuracy, but passed as a value x10^5, in order to eliminate the decimal places) and a radius (in m). When multiplying the coordinates with 0.9, they can approximately be used for a Pythagorean equation which checks, if one coordinate lies within the radius of another:
static int32_t
geo_convert_coordinates(int32_t coordinate)
{
return (cordinate * 10) / 9;
}
bool
geo_check(int32_t lat_fixed,
int32_t lon_fixed,
int32_t lat_var,
int32_t lon_var,
uint16_t radius)
{
lat_fixed = geo_convert_distance(lat_fixed);
lon_fixed = geo_convert_distance(lon_fixed);
lat_var = geo_convert_distance(lat_var);
lon_var = geo_convert_distance(lon_var);
if (((lat_var - lat_fixed) * (lat_var - lat_fixed) + (lon_var - lon_fixed) * (lon_var - lon_fixed))
<= (radius * radius))
{
return true;
}
return false;
}
This solution works quite well for the equator, but when changing the latitude, this becomes increasingly inaccurate, at 70°N the deviation is around 50%. I could change the factor depending on the latitude, but I am not happy with this solution.
Is there a better way to do this calculation? Any help is very much appreciated. Best regards!
UPDATE
I used the input I got and managed to implement a decent solution. I used only signed ints, no floats.
The haversine formula could be simplified: due to the relevant radii (50-500m), the deltas of the latitude and longitude are very small (<0.02°). This means, that the sine can be simplified to sin(x) = x and also the arcsine to asin(x) = x. This approach is very accurate for angles <10° and even better for the small angles used here. This leaves the cosine, which I implemented according to #meaning-matters 's suggestion. The cosine will take an angle and return the actual result multiplied by 100, in order to be able to use ints. The square root was implemented with an iterative loop (I cannot find the so post anymore). The haversine calculation was done with the inputs multiplied by powers of 10 in order to achieve accuracy and afterwards divided by the necessary power of 10.
For my 8bit system, this caused a memory usage of around 2000-2500 Bytes.
Implement the Havesine function using your own trigonometric functions that use lookup tables and do interpolation.
Because you don't want very accurate results, small lookup tables, of perhaps twenty points, would be sufficient. And, simple linear interpolation would also be fine.
In case you don't have much memory space: Bear in mind that to implement sine and cosine, you only need one lookup table for 90 degrees of either function. All values can then be determined by mirroring and offsetting.
I am working on a project which incorporates computing a sine wave as input for a control loop.
The sine wave has a frequency of 280 Hz, and the control loop runs every 30 µs and everything is written in C for an Arm Cortex-M7.
At the moment we are simply doing:
double time;
void control_loop() {
time += 30e-6;
double sine = sin(2 * M_PI * 280 * time);
...
}
Two problems/questions arise:
When running for a long time, time becomes bigger. Suddenly there is a point where the computation time for the sine function increases drastically (see image). Why is this? How are these functions usually implemented? Is there a way to circumvent this (without noticeable precision loss) as speed is a huge factor for us? We are using sin from math.h (Arm GCC).
How can I deal with time in general? When running for a long time, the variable time will inevitably reach the limits of double precision. Even using a counter time = counter++ * 30e-6; only improves this, but it does not solve it. As I am certainly not the first person who wants to generate a sine wave for a long time, there must be some ideas/papers/... on how to implement this fast and precise.
Instead of calculating sine as a function of time, maintain a sine/cosine pair and advance it through complex number multiplication. This doesn't require any trigonometric functions or lookup tables; only four multiplies and an occasional re-normalization:
static const double a = 2 * M_PI * 280 * 30e-6;
static const double dx = cos(a);
static const double dy = sin(a);
double x = 1, y = 0; // complex x + iy
int counter = 0;
void control_loop() {
double xx = dx*x - dy*y;
double yy = dx*y + dy*x;
x = xx, y = yy;
// renormalize once in a while, based on
// https://www.gamedev.net/forums/topic.asp?topic_id=278849
if((counter++ & 0xff) == 0) {
double d = 1 - (x*x + y*y - 1)/2;
x *= d, y *= d;
}
double sine = y; // this is your sine
}
The frequency can be adjusted, if needed, by recomputing dx, dy.
Additionally, all the operations here can be done, rather easily, in fixed point.
Rationality
As #user3386109 points out below (+1), the 280 * 30e-6 = 21 / 2500 is a rational number, thus the sine should loop around after 2500 samples exactly. We can combine this method with theirs by resetting our generator (x=1,y=0) every 2500 iterations (or 5000, or 10000, etc...). This would eliminate the need for renormalization, as well as get rid of any long-term phase inaccuracies.
(Technically any floating point number is a diadic rational. However 280 * 30e-6 doesn't have an exact representation in binary. Yet, by resetting the generator as suggested, we'll get an exactly periodic sine as intended.)
Explanation
Some requested an explanation down in the comments of why this works. The simplest explanation is to use the angle sum trigonometric identities:
xx = cos((n+1)*a) = cos(n*a)*cos(a) - sin(n*a)*sin(a) = x*dx - y*dy
yy = sin((n+1)*a) = sin(n*a)*cos(a) + cos(n*a)*sin(a) = y*dx + x*dy
and the correctness follows by induction.
This is essentially the De Moivre's formula if we view those sine/cosine pairs as complex numbers, in accordance to Euler's formula.
A more insightful way might be to look at it geometrically. Complex multiplication by exp(ia) is equivalent to rotation by a radians. Therefore, by repeatedly multiplying by dx + idy = exp(ia), we incrementally rotate our starting point 1 + 0i along the unit circle. The y coordinate, according to Euler's formula again, is the sine of the current phase.
Normalization
While the phase continues to advance with each iteration, the magnitude (aka norm) of x + iy drifts away from 1 due to round-off errors. However we're interested in generating a sine of amplitude 1, thus we need to normalize x + iy to compensate for numeric drift. The straight forward way is, of course, to divide it by its own norm:
double d = 1/sqrt(x*x + y*y);
x *= d, y *= d;
This requires a calculation of a reciprocal square root. Even though we normalize only once every X iterations, it'd still be cool to avoid it. Fortunately |x + iy| is already close to 1, thus we only need a slight correction to keep it at bay. Expanding the expression for d around 1 (first order Taylor approximation), we get the formula that's in the code:
d = 1 - (x*x + y*y - 1)/2
TODO: to fully understand the validity of this approximation one needs to prove that it compensates for round-off errors faster than they accumulate -- and thus get a bound on how often it needs to be applied.
The function can be rewritten as
double n;
void control_loop() {
n += 1;
double sine = sin(2 * M_PI * 280 * 30e-6 * n);
...
}
That does exactly the same thing as the code in the question, with exactly the same problems. But it can now be simplified:
280 * 30e-6 = 280 * 30 / 1000000 = 21 / 2500 = 8.4e-3
Which means that when n reaches 2500, you've output exactly 21 cycles of the sine wave. Which means that you can set n back to 0.
The resulting code is:
int n;
void control_loop() {
n += 1;
if (n == 2500)
n = 0;
double sine = sin(2 * M_PI * 8.4e-3 * n);
...
}
As long as your code can run for 21 cycles without problems, it'll run forever without problems.
I'm rather shocked at the existing answers. The first problem you detect is easily solved, and the next problem magically disappears when you solve the first problem.
You need a basic understanding of math to see how it works. Recall, sin(x+2pi) is just sin(x), mathematically. The large increase in time you see happens when your sin(float) implementation switches to another algorithm, and you really want to avoid that.
Remember that float has only 6 significant digits. 100000.0f*M_PI+x uses those 6 digits for 100000.0f*M_PI, so there's nothing left for x.
So, the easiest solution is to keep track of x yourself. At t=0 you initialize x to 0.0f. Every 30 us, you increment x+= M_PI * 280 * 30e-06;. The time does not appear in this formula! Finally, if x>2*M_PI, you decrement x-=2*M_PI; (Since sin(x)==sin(x-2*pi)
You now have an x that stays nicely in the range 0 to 6.2834, where sin is fast and the 6 digits of precision are all useful.
How to generate a lovely sine.
DAC is 12bits so you have only 4096 levels. It makes no sense to send more than 4096 samples per period. In real life you will need much less samples to generate a good quality waveform.
Create C file with the lookup table (using your PC). Redirect the output to the file (https://helpdeskgeek.com/how-to/redirect-output-from-command-line-to-text-file/).
#define STEP ((2*M_PI) / 4096.0)
int main(void)
{
double alpha = 0;
printf("#include <stdint.h>\nconst uint16_t sine[4096] = {\n");
for(int x = 0; x < 4096 / 16; x++)
{
for(int y = 0; y < 16; y++)
{
printf("%d, ", (int)(4095 * (sin(alpha) + 1.0) / 2.0));
alpha += STEP;
}
printf("\n");
}
printf("};\n");
}
https://godbolt.org/z/e899d98oW
Configure the timer to trigger the overflow 4096*280=1146880 times per second. Set the timer to generate the DAC trigger event. For 180MHz timer clock it will not be precise and the frequency will be 279.906449045Hz. If you need better precision change the number of samples to match your timer frequency or/and change the timer clock frequency (H7 timers can run up to 480MHz)
Configure DAC to use DMA and transfer the value from the lookup table created in the step 1 to the DAC on the trigger event.
Enjoy beautiful sine wave using your oscilloscope. Note that your microcontroller core will not be loaded at all. You will have it for other tasks. If you want to change the period simple reconfigure the timer. You can do it as many times per second as you wish. To reconfigure the timer use timer DMA burst mode - which will reload PSC & ARR registers on the upddate event automatically not disturbing the generated waveform.
I know it is advanced STM32 programming and it will require register level programming. I use it to generate complex waveforms in our devices.
It is the correct way of doing it. No control loops, no calculations, no core load.
I'd like to address the embedded programming issues in your code directly - #0___________'s answer is the correct way to do this on a microcontroller and I won't retread the same ground.
Variables representing time should never be floating point. If your increment is not a power of two, errors will always accumulate. Even if it is, eventually your increment will be smaller than the smallest increment and the timer will stop. Always use integers for time. You can pick an integer size big enough to ignore roll over - an unsigned 32 bit integer representing milliseconds will take 50 days to roll over, while an unsigned 64 bit integer will take over 500 million years.
Generating any periodic signal where you do not care about the signal's phase does not require a time variable. Instead, you can keep an internal counter which resets to 0 at the end of a period. (When you use DMA with a look-up table, that's exactly what you're doing - the counter is the DMA controller's next-read pointer.)
Whenever you use a transcendental function such as sine in a microcontroller, your first thought should be "can I use a look-up table for this?" You don't have access to the luxury of a modern operating system optimally shuffling your load around on a 4 GHz+ multi-core processor. You're often dealing with a single thread that will stall waiting for your 200 MHz microcontroller to bring the FPU out of standby and perform the approximation algorithm. There is a significant cost to transcendental functions. There's a cost to LUTs too, but if you're hitting the function constantly, there's a good chance you'll like the tradeoffs of the LUT a lot better.
As noted in some of the comments, the time value is continually growing with time. This poses two problems:
The sin function likely has to perform a modulus internally to get the internal value into a supported range.
The resolution of time will become worse and worse as the value increases, due to adding on higher digits.
Making the following changes should improve the performance:
double time;
void control_loop() {
time += 30.0e-6;
if((1.0/280.0) < time)
{
time -= 1.0/280.0;
}
double sine = sin(2 * M_PI * 280 * time);
...
}
Note that once this change is made, you will no longer have a time variable.
Use a look-up table. Your comment in the discussion with Eugene Sh.:
A small deviation from the sine frequency (like 280.1Hz) would be ok.
In that case, with a control interval of 30 µs, if you have a table of 119 samples that you repeat over and over, you will get a sine wave of 280.112 Hz. Since you have a 12-bit DAC, you only need 119 * 2 = 238 bytes to store this if you would output it directly to the DAC. If you use it as input for further calculations like you mention in the comments, you can store it as float or double as desired. On an MCU with embedded static RAM, it only takes a few cycles at most to load from memory.
If you have a few kilobytes of memory available, you can eliminate this problem completely with a lookup table.
With a sampling period of 30 µs, 2500 samples will have a total duration of 75 ms. This is exactly equal to the duration of 21 cycles at 280 Hz.
I haven't tested or compiled the following code, but it should at least demonstrate the approach:
double sin2500() {
static double *table = NULL;
static int n = 2499;
if (!table) {
table = malloc(2500 * sizeof(double));
for (int i=0; i<2500; i++) table[i] = sin(2 * M_PI * 280 * i * 30e-06);
}
n = (n+1) % 2500;
return table[n];
}
How about a variant of others' modulo-based concept:
int t = 0;
int divisor = 1000000;
void control_loop() {
t += 30 * 280;
if (t > divisor) t -= divisor;
double sine = sin(2 * M_PI * t / (double)divisor));
...
}
It calculates the modulo in integer then causes no roundoff errors.
There is an alternative approach to calculating a series of values of sine (and cosine) for angles that increase by some very small amount. It essentially devolves down to calculating the X and Y coordinates of a circle, and then dividing the Y value by some constant to produce the sine, and dividing the X value by the same constant to produce the cosine.
If you are content to generate a "very round ellipse", you can use a following hack, which is attributed to Marvin Minsky in the 1960s. It's much faster than calculating sines and cosines, although it introduces a very small error into the series. Here is an extract from the Hakmem Document, Item 149. The Minsky circle algorithm is outlined.
ITEM 149 (Minsky): CIRCLE ALGORITHM
Here is an elegant way to draw almost circles on a point-plotting display:
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X
This makes a very round ellipse centered at the origin with its size determined by the initial point. epsilon determines the angular velocity of the circulating point, and slightly affects the eccentricity. If epsilon is a power of 2, then we don't even need multiplication, let alone square roots, sines, and cosines! The "circle" will be perfectly stable because the points soon become periodic.
The circle algorithm was invented by mistake when I tried to save one register in a display hack! Ben Gurley had an amazing display hack using only about six or seven instructions, and it was a great wonder. But it was basically line-oriented. It occurred to me that it would be exciting to have curves, and I was trying to get a curve display hack with minimal instructions.
Here is a link to the hakmem: http://inwap.com/pdp10/hbaker/hakmem/hacks.html
I think it would be possible to use a modulo because sin() is periodic.
Then you don’t have to worry about the problems.
double time = 0;
long unsigned int timesteps = 0;
double sine;
void controll_loop()
{
timesteps++;
time += 30e-6;
if( time > 1 )
{
time -= 1;
}
sine = sin( 2 * M_PI * 280 * time );
...
}
Fascinating thread. Minsky's algorithm mentioned in Walter Mitty's answer reminded me of a method for drawing circles that was published in Electronics & Wireless World and that I kept. (Credit: https://www.electronicsworld.co.uk/magazines/). I'm attaching it here for interest.
However, for my own similar projects (for audio synthesis) I use a lookup table, with enough points that linear interpolation is accurate enough (do the math(s)!)
I'm making a C program in which I simulate a Patriot missile system. In this simulation my Patriot missile has to catch an incoming enemy target missile.
The information about the Patriot missile and the enemy target are stored in a structure like this:
typedef struct _stat {
float32_t x;
float32_t y;
float32_t v; // speed magnitude
float32_t v_theta; // speed angle in radians
float32_t a; // acceleration magnitude
float32_t a_theta; // acceleration angle in radians
} stat;
And I'm storing the informations in two globals variables like those:
stat t_stat; // target stats
stat p_stat; // patriot stats
Now, to simplify the problem the target is moving thanks to an initial speed and is affected only by gravity, so we can consider:
t_stat.x = TARGET_X0;
t_stat.y = TARGET_Y0;
t_stat.v = TARGET_V0;
t_stat.v_theta = TARGET_V_THETA0;
t_stat.a = G; // gravity acceleration
t_stat.a_theta = -(PI / 2);
Again, to simplify I'm also considering to compute the collision point when the Patriot has reached its top speed, so its own acceleration is only used to balance the gravity acceleration. In particular we have:
p_stat.x = PATRIOT_X0;
p_stat.y = PATRIOT_Y0;
p_stat.v = 1701,45; // Mach 5 speed in m/s
p_stat.v_theta = ???? // that's what I need to find
p_stat.a = G; // gravity acceleration
p_stat.a_theta = PI / 2;
In this way we can consider the Patriot as moving at constant speed because the sum of the accelerations by which is affected is equal to 0.
float32_t patr_ax = p_stat.a * cos(p_stat.a_theta); // = 0
float32_t patr_ay = p_stat.a * sin(p_stat.a_theta) - G; // = 0
Now, here comes the problem. I want to write a function which computes the right p_stat.v_theta in order to hit the target (if a collision is possible).
For example the function that I need could have a prototype like this:
uint8_t computeIntercept(stat t, stat p, float32_t *theta);
And it can be used in this way:
if(computeIntercept(t_stat, p_stat, &p_stat.v_theta)) {
printf("Target Hit with an angle of %.2f\n", p_stat.v_theta);
} else {
printf("Target Missed!\n");
}
For making it even more clear, here is the image which I want
Your target projectile is moving with constant acceleration hence the velocity can be described as
Now integrating this equation gives us the equation of the position.
Now by knowing the initial position and we can determine this constant vector is the initial position
Now the position of the target projectile is finally
This are two equations (for x and y coordinate). The equation for y is quadratic and the equation for x is linear since the acceleration (gravitational) is in the vertical direction.
You have
In general you should do something like this :
You can use https://en.wikipedia.org/wiki/Newton%27s_method
in order to solve the last equation for theta that you get.
In order to have a collision you need the coordinates of both objects to be identical at the same instant of time.
You could decompose the problem into two simpler ones, considering each axis, x and y, separately:
You need to calculate the equations of motion for the both objects, once for their horizontal components and once for their vertical components.
Check whether the solutions of both objects contain equal coordinates and if this happens at the same instant of time.
Target coordinates: T (xt, yt)
Patriot coordinates: P (xp, yp)
You could solve this numerically by varying the time, t, and observing whether: T == P.
In your case, one of the equations should contain a parameter accounting for the angle theta.
Simulate the event and find the time the 2 objects are closest.
The missile can only fly so long after launch (negating it going into orbit), let that be tf.
Using struct _stat for each object, write a function that report the x,y for a given t and object.
Simulate, at reasonable intervals (1s?), 0.0 to tf, the square of the distance between the two objects.
From the time t of the closest approach, use it 2 neighbors t-dt and t+dt to do the simulation over again. Could use time from t-dt to t+dt with 10x smaller dt or other methods.
Repeat the above until the distance is close enough or dt is sufficiently small.
If this distance is sufficiently small, evaluate struct _stat for the needed data now that the time is determined.
Note: the details of the complexity of pt2 compute_position(struct _stat st, time t) only need consideration for the the initial dt estimate.
A naive implementation of vector rotation in 3d gives huge rounding errors, especially when multiple rotations around different axis are performed. A simple 1-axis example shows the basic problem. I have a code where I rotate points around x- and y- axis a few times. In some cases, I get errors in the second decimal place (e.g. length of the vector is 1 before rotations and 0.9 after). I'd be happy with relative errors < 1e-5.
void Rotate_x(double data[3], double agl) {
agl *= M_PI/180.0;
double c = cos(agl); double s = sin(agl);
double tmp_y = c*data[1] - s*data[2];
double tmp_z = s*data[1] + c*data[2];
data[1] = tmp_y; data[2] = tmp_z;
}
Can someone point me to a library or some code that rotates points around the coordinate axis with minimal error?
Everything I found were bloated linear algebra libraries that are overkill for my purposes.
Edit:
I went to long double precision and combined rotations to improve errors. With doubles I was not fully satisfied (1e-3 relative error in worst case). That was the easiest solution an it works okay. Still wouldn't mind a nice library that does rotations in regular double precision accurately.
better precision variables are not enough
you need more precise sin,cos functions to improve accuracy
so make your own functions via Taylor series expansion
and use that ... then compare the results
and increase the polynomial order until accuracy stop raising or start dropping again
if you are applying many transformations on the same data
then create cumulative transform matrix
then check if it is orthogonal/orthonormal
and repair if not (with use of cross product)
I use this for 3D render object matrices (many cumulative transforms over time)
but in your case this can also increase error (if chosen wrong order of axises during correction)
this is better suited to ensure that object will stay the same size/shape over time ...
[edit1] test
I took your code to Borland BDS2006 compile as win32 app
and the result is:
original: (0.0000000000000000000,1.0000000000000000000,0.0000000000000000000)
rotated: (0.0000000000000000000,0.9999999999999998890,-0.0000000000000000273)
also do not forget if your sin,cos taking radians (as usuall for C/C++) then add this to Rotate
agl*=M_PI/180.0;
What compiler/platform are you using?
This is how mine Rotate looks like
void Rotate(double *data,double agl)
{
agl*=M_PI/180.0;
double c = cos(agl); double s = sin(agl);
double tmp_y = c*data[1] - s*data[2];
double tmp_z = s*data[1] + c*data[2];
data[1] = tmp_y; data[2] = tmp_z;
}
[edit2] 32/64 bit comparison
[double] //64bit floating point
(0.0000000000000000000,1.0000000000000000000,0.0000000000000000000)
(0.0000000000000000000,0.9999999999999998890,-0.0000000000000000273)
[float] //32bit floating point
(0.0000000000000000000,1.0000000000000000000,0.0000000000000000000)
(0.0000000000000000000,0.9999999403953552246,-0.0000000146747787255)
I'm trying to develop a simple C application that can give a value from 0-100 at a certain frequency range at a given timestamp in a WAV-file.
Example: I have frequency range of 44.1kHz (typical MP3 file) and I want to split that range into n amount of ranges (starting from 0). I then need to get the amplitude of each range, being from 0 to 100.
What I've managed so far:
Using libsndfile I'm now able to read the data of a WAV-file.
infile = sf_open(argv [1], SFM_READ, &sfinfo);
float samples[sfinfo.frames];
sf_read_float(infile, samples, 1);
However, my understanding of FFT is rather limited. But I know it's required inorder to get the amplitudes at the ranges I need. But how do I move on from here? I found the library FFTW-3, which seems to be suited for the purpose.
I found some help here: https://stackoverflow.com/a/4371627/1141483
and looked at the FFTW tutorial here: http://www.fftw.org/fftw2_doc/fftw_2.html
But as I'm unsure about the behaviour of the FFTW, I don't know to progress from here.
And another question, assuming you use libsndfile: If you force the reading to be single channeled (with a stereo file) and then read the samples. Will you then actually only be reading half of the samples of the total file? As half of them being from channel 1, or does automaticly filter those out?
Thanks a ton for your help.
EDIT: My code can be seen here:
double blackman_harris(int n, int N){
double a0, a1, a2, a3, seg1, seg2, seg3, w_n;
a0 = 0.35875;
a1 = 0.48829;
a2 = 0.14128;
a3 = 0.01168;
seg1 = a1 * (double) cos( ((double) 2 * (double) M_PI * (double) n) / ((double) N - (double) 1) );
seg2 = a2 * (double) cos( ((double) 4 * (double) M_PI * (double) n) / ((double) N - (double) 1) );
seg3 = a3 * (double) cos( ((double) 6 * (double) M_PI * (double) n) / ((double) N - (double) 1) );
w_n = a0 - seg1 + seg2 - seg3;
return w_n;
}
int main (int argc, char * argv [])
{ char *infilename ;
SNDFILE *infile = NULL ;
FILE *outfile = NULL ;
SF_INFO sfinfo ;
infile = sf_open(argv [1], SFM_READ, &sfinfo);
int N = pow(2, 10);
fftw_complex results[N/2 +1];
double samples[N];
sf_read_double(infile, samples, 1);
double normalizer;
int k;
for(k = 0; k < N;k++){
if(k == 0){
normalizer = blackman_harris(k, N);
} else {
normalizer = blackman_harris(k, N);
}
}
normalizer = normalizer * (double) N/2;
fftw_plan p = fftw_plan_dft_r2c_1d(N, samples, results, FFTW_ESTIMATE);
fftw_execute(p);
int i;
for(i = 0; i < N/2 +1; i++){
double value = ((double) sqrtf(creal(results[i])*creal(results[i])+cimag(results[i])*cimag(results[i]))/normalizer);
printf("%f\n", value);
}
sf_close (infile) ;
return 0 ;
} /* main */
Well it all depends on the frequency range you're after. An FFT works by taking 2^n samples and providing you with 2^(n-1) real and imaginary numbers. I have to admit I'm quite hazy on what exactly these values represent (I've got a friend who has promised to go through it all with me in lieu of a loan I made him when he had financial issues ;)) other than an angle around a circle. Effectively they provide you with an arccos of the angle parameter for a sine and cosine for each frequency bin from which the original 2^n samples can be, perfectly, reconstructed.
Anyway this has the huge advantage that you can calculate magnitude by taking the euclidean distance of the real and imaginary parts (sqrtf( (real * real) + (imag * imag) )). This provides you with an unnormalised distance value. This value can then be used to build a magnitude for each frequency band.
So lets take an order 10 FFT (2^10). You input 1024 samples. You FFT those samples and you get 512 imaginary and real values back (the particular ordering of those values depends on the FFT algorithm you use). So this means that for a 44.1Khz audio file each bin represents 44100/512 Hz or ~86Hz per bin.
One thing that should stand out from this is that if you use more samples (from whats called the time or spatial domain when dealing with multi dimensional signals such as images) you get better frequency representation (in whats called the frequency domain). However you sacrifice one for the other. This is just the way things go and you will have to live with it.
Basically you will need to tune the frequency bins and time/spatial resolution to get the data you require.
First a bit of nomenclature. The 1024 time domain samples I referred to earlier is called your window. Generally when performing this sort of process you will want to slide the window on by some amount to get the next 1024 samples you FFT. The obvious thing to do would be to take samples 0->1023, then 1024->2047, and so forth. This unfortunately doesn't give the best results. Ideally you want to overlap the windows to some degree so that you get a smoother frequency change over time. Most commonly people slide the window on by half a window size. ie your first window will be 0->1023 the second 512->1535 and so on and so forth.
Now this then brings up one further problem. While this information provides for perfect inverse FFT signal reconstruction it leaves you with a problem that frequencies leak into surround bins to some extent. To solve this issue some mathematicians (far more intelligent than me) came up with the concept of a window function. The window function provides for far better frequency isolation in the frequency domain though leads to a loss of information in the time domain (ie its impossible to perfectly re-construct the signal after you have used a window function, AFAIK).
Now there are various types of window function ranging from the rectangular window (effectively doing nothing to the signal) to various functions that provide far better frequency isolation (though some may also kill surrounding frequencies that may be of interest to you!!). There is, alas, no one size fits all but I'm a big fan (for spectrograms) of the blackmann-harris window function. I think it gives the best looking results!
However as I mentioned earlier the FFT provides you with an unnormalised spectrum. To normalise the spectrum (after the euclidean distance calculation) you need to divide all the values by a normalisation factor (I go into more detail here).
this normalisation will provide you with a value between 0 and 1. So you could easily multiple this value by 100 to get your 0 to 100 scale.
This, however, is not where it ends. The spectrum you get from this is rather unsatisfying. This is because you are looking at the magnitude using a linear scale. Unfortunately the human ear hears using a logarithmic scale. This rather causes issues with how a spectrogram/spectrum looks.
To get round this you need to convert these 0 to 1 values (I'll call it 'x') to the decibel scale. The standard transformation is 20.0f * log10f( x ). This will then provide you a value whereby 1 has converted to 0 and 0 has converted to -infinity. your magnitudes are now in the appropriate logarithmic scale. However its not always that helpful.
At this point you need to look into the original sample bit depth. At 16-bit sampling you get a value that is between 32767 and -32768. This means your dynamic range is fabsf( 20.0f * log10f( 1.0f / 65536.0f ) ) or ~96.33dB. So now we have this value.
Take the values we've got from the dB calculation above. Add this -96.33 value to it. Obviously the maximum amplitude (0) is now 96.33. Now didivde by that same value and you nowhave a value ranging from -infinity to 1.0f. Clamp the lower end to 0 and you now have a range from 0 to 1 and multiply that by 100 and you have your final 0 to 100 range.
And that is much more of a monster post than I had originally intended but should give you a good grounding in how to generate a good spectrum/spectrogram for an input signal.
and breathe
Further reading (for people other than the original poster who has already found it):
Converting an FFT to a spectogram
Edit: As an aside I found kiss FFT far easier to use, my code to perform a forward fft is as follows:
CFFT::CFFT( unsigned int fftOrder ) :
BaseFFT( fftOrder )
{
mFFTSetupFwd = kiss_fftr_alloc( 1 << fftOrder, 0, NULL, NULL );
}
bool CFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
kiss_fftr( mFFTSetupFwd, pIn, (kiss_fft_cpx*)pOut );
return true;
}