Getting underestimated floating point factors

Getting underestimated floating point factors - c

Most physics engines support doing an object trajectory trace that will return a factor between 0.0 and 1.0 representing how far along an object's trajectory it will first hit an object.
The problem I'm concerned about is cases where moving the object by that factor of its trajectory results in its position being past the boundary it was supposed to hit and stop at (due to floating point rounding).
For instance, I created a C program that tried random cases until it ran into this issue, and came up with this example (although I've experienced ones with much less extreme movement, so it's not specific to large floating points):
float start = 4884361.0f;
float wall = 37961976.0f;
float end = 1398674432.0f;
float time = (wall - start) / (end - start);
float new_pos = start + time * (end - start);
printf("We hit %f, but moving left us at %f.\n", wall, new_pos);
And this case prints out: We hit 37961976.000000, but moving left us at 37961980.000000.
So the position moved beyond the wall position and now the object is stuck inside the wall.
Is there a way to generate the factor or perform the factor multiplication such that the floating point error will always undershoot the actual value for all possible values?

The calculated value is the next (or nearly the next) floating point number. We are at the limits of float precision. To insure an answer is at or to one side of the expected answer, there are a number of approaches
1) Higher intermediate precision: (should come up with the right answer far more often)
float start = 4884361.0f;
float wall = 37961976.0f;
float end = 1398674432.0f;
double time = ((double)wall - start) / ((double)end - start);
float new_pos = start + time * ((double)end - start);
2) Logical: (this absolutely will work)
if (new_pos > wall) new_pos = wall;
3) Use a slightly lower time value: (a gentle hack)
float new_pos = start + nextafterf(time,0.0f) * (end - start);
4) Change FP rounding mode to round toward zero: (this may have a large impact though)
fesetround(FE_TOWARDZERO);
5) A simple factor:
static const float factor = 0.99999;
float new_pos = start + factor*time * (end - start);
Lots of pros & cons per approach.

Related

Endless sine generation in C

I am working on a project which incorporates computing a sine wave as input for a control loop.
The sine wave has a frequency of 280 Hz, and the control loop runs every 30 µs and everything is written in C for an Arm Cortex-M7.
At the moment we are simply doing:
double time;
void control_loop() {
time += 30e-6;
double sine = sin(2 * M_PI * 280 * time);
...
}
Two problems/questions arise:
When running for a long time, time becomes bigger. Suddenly there is a point where the computation time for the sine function increases drastically (see image). Why is this? How are these functions usually implemented? Is there a way to circumvent this (without noticeable precision loss) as speed is a huge factor for us? We are using sin from math.h (Arm GCC).
How can I deal with time in general? When running for a long time, the variable time will inevitably reach the limits of double precision. Even using a counter time = counter++ * 30e-6; only improves this, but it does not solve it. As I am certainly not the first person who wants to generate a sine wave for a long time, there must be some ideas/papers/... on how to implement this fast and precise.

Instead of calculating sine as a function of time, maintain a sine/cosine pair and advance it through complex number multiplication. This doesn't require any trigonometric functions or lookup tables; only four multiplies and an occasional re-normalization:
static const double a = 2 * M_PI * 280 * 30e-6;
static const double dx = cos(a);
static const double dy = sin(a);
double x = 1, y = 0; // complex x + iy
int counter = 0;
void control_loop() {
double xx = dx*x - dy*y;
double yy = dx*y + dy*x;
x = xx, y = yy;
// renormalize once in a while, based on
// https://www.gamedev.net/forums/topic.asp?topic_id=278849
if((counter++ & 0xff) == 0) {
double d = 1 - (x*x + y*y - 1)/2;
x *= d, y *= d;
}
double sine = y; // this is your sine
}
The frequency can be adjusted, if needed, by recomputing dx, dy.
Additionally, all the operations here can be done, rather easily, in fixed point.
Rationality
As #user3386109 points out below (+1), the 280 * 30e-6 = 21 / 2500 is a rational number, thus the sine should loop around after 2500 samples exactly. We can combine this method with theirs by resetting our generator (x=1,y=0) every 2500 iterations (or 5000, or 10000, etc...). This would eliminate the need for renormalization, as well as get rid of any long-term phase inaccuracies.
(Technically any floating point number is a diadic rational. However 280 * 30e-6 doesn't have an exact representation in binary. Yet, by resetting the generator as suggested, we'll get an exactly periodic sine as intended.)
Explanation
Some requested an explanation down in the comments of why this works. The simplest explanation is to use the angle sum trigonometric identities:
xx = cos((n+1)*a) = cos(n*a)*cos(a) - sin(n*a)*sin(a) = x*dx - y*dy
yy = sin((n+1)*a) = sin(n*a)*cos(a) + cos(n*a)*sin(a) = y*dx + x*dy
and the correctness follows by induction.
This is essentially the De Moivre's formula if we view those sine/cosine pairs as complex numbers, in accordance to Euler's formula.
A more insightful way might be to look at it geometrically. Complex multiplication by exp(ia) is equivalent to rotation by a radians. Therefore, by repeatedly multiplying by dx + idy = exp(ia), we incrementally rotate our starting point 1 + 0i along the unit circle. The y coordinate, according to Euler's formula again, is the sine of the current phase.
Normalization
While the phase continues to advance with each iteration, the magnitude (aka norm) of x + iy drifts away from 1 due to round-off errors. However we're interested in generating a sine of amplitude 1, thus we need to normalize x + iy to compensate for numeric drift. The straight forward way is, of course, to divide it by its own norm:
double d = 1/sqrt(x*x + y*y);
x *= d, y *= d;
This requires a calculation of a reciprocal square root. Even though we normalize only once every X iterations, it'd still be cool to avoid it. Fortunately |x + iy| is already close to 1, thus we only need a slight correction to keep it at bay. Expanding the expression for d around 1 (first order Taylor approximation), we get the formula that's in the code:
d = 1 - (x*x + y*y - 1)/2
TODO: to fully understand the validity of this approximation one needs to prove that it compensates for round-off errors faster than they accumulate -- and thus get a bound on how often it needs to be applied.

The function can be rewritten as
double n;
void control_loop() {
n += 1;
double sine = sin(2 * M_PI * 280 * 30e-6 * n);
...
}
That does exactly the same thing as the code in the question, with exactly the same problems. But it can now be simplified:
280 * 30e-6 = 280 * 30 / 1000000 = 21 / 2500 = 8.4e-3
Which means that when n reaches 2500, you've output exactly 21 cycles of the sine wave. Which means that you can set n back to 0.
The resulting code is:
int n;
void control_loop() {
n += 1;
if (n == 2500)
n = 0;
double sine = sin(2 * M_PI * 8.4e-3 * n);
...
}
As long as your code can run for 21 cycles without problems, it'll run forever without problems.

I'm rather shocked at the existing answers. The first problem you detect is easily solved, and the next problem magically disappears when you solve the first problem.
You need a basic understanding of math to see how it works. Recall, sin(x+2pi) is just sin(x), mathematically. The large increase in time you see happens when your sin(float) implementation switches to another algorithm, and you really want to avoid that.
Remember that float has only 6 significant digits. 100000.0f*M_PI+x uses those 6 digits for 100000.0f*M_PI, so there's nothing left for x.
So, the easiest solution is to keep track of x yourself. At t=0 you initialize x to 0.0f. Every 30 us, you increment x+= M_PI * 280 * 30e-06;. The time does not appear in this formula! Finally, if x>2*M_PI, you decrement x-=2*M_PI; (Since sin(x)==sin(x-2*pi)
You now have an x that stays nicely in the range 0 to 6.2834, where sin is fast and the 6 digits of precision are all useful.

How to generate a lovely sine.
DAC is 12bits so you have only 4096 levels. It makes no sense to send more than 4096 samples per period. In real life you will need much less samples to generate a good quality waveform.
Create C file with the lookup table (using your PC). Redirect the output to the file (https://helpdeskgeek.com/how-to/redirect-output-from-command-line-to-text-file/).
#define STEP ((2*M_PI) / 4096.0)
int main(void)
{
double alpha = 0;
printf("#include <stdint.h>\nconst uint16_t sine[4096] = {\n");
for(int x = 0; x < 4096 / 16; x++)
{
for(int y = 0; y < 16; y++)
{
printf("%d, ", (int)(4095 * (sin(alpha) + 1.0) / 2.0));
alpha += STEP;
}
printf("\n");
}
printf("};\n");
}
https://godbolt.org/z/e899d98oW
Configure the timer to trigger the overflow 4096*280=1146880 times per second. Set the timer to generate the DAC trigger event. For 180MHz timer clock it will not be precise and the frequency will be 279.906449045Hz. If you need better precision change the number of samples to match your timer frequency or/and change the timer clock frequency (H7 timers can run up to 480MHz)
Configure DAC to use DMA and transfer the value from the lookup table created in the step 1 to the DAC on the trigger event.
Enjoy beautiful sine wave using your oscilloscope. Note that your microcontroller core will not be loaded at all. You will have it for other tasks. If you want to change the period simple reconfigure the timer. You can do it as many times per second as you wish. To reconfigure the timer use timer DMA burst mode - which will reload PSC & ARR registers on the upddate event automatically not disturbing the generated waveform.
I know it is advanced STM32 programming and it will require register level programming. I use it to generate complex waveforms in our devices.
It is the correct way of doing it. No control loops, no calculations, no core load.

I'd like to address the embedded programming issues in your code directly - #0___________'s answer is the correct way to do this on a microcontroller and I won't retread the same ground.
Variables representing time should never be floating point. If your increment is not a power of two, errors will always accumulate. Even if it is, eventually your increment will be smaller than the smallest increment and the timer will stop. Always use integers for time. You can pick an integer size big enough to ignore roll over - an unsigned 32 bit integer representing milliseconds will take 50 days to roll over, while an unsigned 64 bit integer will take over 500 million years.
Generating any periodic signal where you do not care about the signal's phase does not require a time variable. Instead, you can keep an internal counter which resets to 0 at the end of a period. (When you use DMA with a look-up table, that's exactly what you're doing - the counter is the DMA controller's next-read pointer.)
Whenever you use a transcendental function such as sine in a microcontroller, your first thought should be "can I use a look-up table for this?" You don't have access to the luxury of a modern operating system optimally shuffling your load around on a 4 GHz+ multi-core processor. You're often dealing with a single thread that will stall waiting for your 200 MHz microcontroller to bring the FPU out of standby and perform the approximation algorithm. There is a significant cost to transcendental functions. There's a cost to LUTs too, but if you're hitting the function constantly, there's a good chance you'll like the tradeoffs of the LUT a lot better.

As noted in some of the comments, the time value is continually growing with time. This poses two problems:
The sin function likely has to perform a modulus internally to get the internal value into a supported range.
The resolution of time will become worse and worse as the value increases, due to adding on higher digits.
Making the following changes should improve the performance:
double time;
void control_loop() {
time += 30.0e-6;
if((1.0/280.0) < time)
{
time -= 1.0/280.0;
}
double sine = sin(2 * M_PI * 280 * time);
...
}
Note that once this change is made, you will no longer have a time variable.

Use a look-up table. Your comment in the discussion with Eugene Sh.:
A small deviation from the sine frequency (like 280.1Hz) would be ok.
In that case, with a control interval of 30 µs, if you have a table of 119 samples that you repeat over and over, you will get a sine wave of 280.112 Hz. Since you have a 12-bit DAC, you only need 119 * 2 = 238 bytes to store this if you would output it directly to the DAC. If you use it as input for further calculations like you mention in the comments, you can store it as float or double as desired. On an MCU with embedded static RAM, it only takes a few cycles at most to load from memory.

If you have a few kilobytes of memory available, you can eliminate this problem completely with a lookup table.
With a sampling period of 30 µs, 2500 samples will have a total duration of 75 ms. This is exactly equal to the duration of 21 cycles at 280 Hz.
I haven't tested or compiled the following code, but it should at least demonstrate the approach:
double sin2500() {
static double *table = NULL;
static int n = 2499;
if (!table) {
table = malloc(2500 * sizeof(double));
for (int i=0; i<2500; i++) table[i] = sin(2 * M_PI * 280 * i * 30e-06);
}
n = (n+1) % 2500;
return table[n];
}

How about a variant of others' modulo-based concept:
int t = 0;
int divisor = 1000000;
void control_loop() {
t += 30 * 280;
if (t > divisor) t -= divisor;
double sine = sin(2 * M_PI * t / (double)divisor));
...
}
It calculates the modulo in integer then causes no roundoff errors.

There is an alternative approach to calculating a series of values of sine (and cosine) for angles that increase by some very small amount. It essentially devolves down to calculating the X and Y coordinates of a circle, and then dividing the Y value by some constant to produce the sine, and dividing the X value by the same constant to produce the cosine.
If you are content to generate a "very round ellipse", you can use a following hack, which is attributed to Marvin Minsky in the 1960s. It's much faster than calculating sines and cosines, although it introduces a very small error into the series. Here is an extract from the Hakmem Document, Item 149. The Minsky circle algorithm is outlined.
ITEM 149 (Minsky): CIRCLE ALGORITHM
Here is an elegant way to draw almost circles on a point-plotting display:
NEW X = OLD X - epsilon * OLD Y
NEW Y = OLD Y + epsilon * NEW(!) X
This makes a very round ellipse centered at the origin with its size determined by the initial point. epsilon determines the angular velocity of the circulating point, and slightly affects the eccentricity. If epsilon is a power of 2, then we don't even need multiplication, let alone square roots, sines, and cosines! The "circle" will be perfectly stable because the points soon become periodic.
The circle algorithm was invented by mistake when I tried to save one register in a display hack! Ben Gurley had an amazing display hack using only about six or seven instructions, and it was a great wonder. But it was basically line-oriented. It occurred to me that it would be exciting to have curves, and I was trying to get a curve display hack with minimal instructions.
Here is a link to the hakmem: http://inwap.com/pdp10/hbaker/hakmem/hacks.html

I think it would be possible to use a modulo because sin() is periodic.
Then you don’t have to worry about the problems.
double time = 0;
long unsigned int timesteps = 0;
double sine;
void controll_loop()
{
timesteps++;
time += 30e-6;
if( time > 1 )
{
time -= 1;
}
sine = sin( 2 * M_PI * 280 * time );
...
}

Fascinating thread. Minsky's algorithm mentioned in Walter Mitty's answer reminded me of a method for drawing circles that was published in Electronics & Wireless World and that I kept. (Credit: https://www.electronicsworld.co.uk/magazines/). I'm attaching it here for interest.
However, for my own similar projects (for audio synthesis) I use a lookup table, with enough points that linear interpolation is accurate enough (do the math(s)!)

Simple integration that depends on floating point equality

I have the following very-crude integration calculator:
// definite integrate on one variable
// using basic trapezoid approach
float integrate(float start, float end, float step, float (*func)(float x))
{
if (start >= (end-step))
return 0;
else {
float x = start; // make it a bit more math-like
float segment = step * (func(x) + func(x+step))/2;
return segment + integrate(x+step, end, step, func);
}
}
And an example usage:
static float square(float x) {return x*x;}
int main(void)
{
// Integral x^2 from 0->2 should be ~ 2.6
float start=0.0, end=2.0, step=0.01;
float answer = integrate(start, end, step, square);
printf("The integral from %.2f to %.2f for X^2 = %.2f\n", start, end, answer );
}
$ run
The integral from 0.00 to 2.00 for X^2 = 2.67
What happens if the equality check at start >= (end-step) doesn't work? For example, if it evaluates something to 2.99997 instead of 3 and so does another loop (or one less loop). Is there a way to prevent that, or do most math-type calculators just work in decimals or some extension to the 'normal' floating points?

If you are given step, one way to write a loop (and you should use a loop for this, not recursion) is:
float x;
for (float i = 0; (x = start + i*step) < end - step/2; ++i)
…
Some points about this:
We keep an integer count with i. As long as there are a reasonable number of steps, there will be no floating-point rounding error in this. (We could make i and int, but float can count integer values perfectly well, and using float avoids an int-to-float conversion in i*step.)
Instead of incrementing x (or start as it is passed by recursion) repeatedly, we recalculate it each time as start + i*step. This has only two possible rounding errors, in the multiplication and in the addition, so it avoids accumulating errors over repeated additions.
We use end - step/2 as the threshold. This allows us to catch the desired endpoint even if the calculated x drifts as far away from end as end - step/2. And that is about the best we can do, because if it is drifting farther than half a step away from the ideally spaced points, we cannot tell if it has drifted +step/2 from end-step or -step/2 from end.
This presumes that step is an integer division of end-start, or pretty close to it, so that there are a whole number of steps in the loop. If it is not, the loop should be redesigned a bit to stop one step earlier and then calculate a step of partial width at the end.
At the beginning, I mentioned being given step. An alternative is you might be given a number of steps to use, and then the step width would be calculated from that. In that case, we would use an integer number of steps to control the loop. The loop termination condition would not involve floating-point rounding at all. We could calculate x as (float) i / NumberOfSteps * (end-start) + start.

Two improvements can be made easily.
Using recursion is a bad idea. Each additional call creates a new stack frame. For a sufficiently large number of steps, you will trigger a Stack Overflow. Use a loop instead.
Normally, you would avoid the rounding problem by using start, end and n, the number of steps. The location of the kth interval would be at start + k * (end - start) / n;
So you could rewrite your function as
float integrate(float start, float end, int n, float (*func)(float x))
{
float next = start;
float sum = 0.0f;
for(int k = 0; k < n; k++) {
float x = next;
next = start + k * (end - start) / n;
sum += 0.5f * (next - x) * (func(x) + func(next));
}
return sum;
}

Explain this code in K&R 2-1

I'm trying to determine range of the various floating-point types. When I read this code:
#include <stdio.h>
main()
{
float fl, fltest, last;
double dbl, dbltest, dblast;
fl = 0.0;
fltest = 0.0;
while (fl == 0.0) {
last = fltest;
fltest = fltest + 1111e28;
fl = (fl + fltest) - fltest;
}
printf("Maximum range of float variable: %e\n", last);
dbl = 0.0;
dbltest = 0.0;
while (dbl == 0.0) {
dblast = dbltest;
dbltest = dbltest + 1111e297;
dbl = (dbl + dbltest) - dbltest;
}
printf("Maximum range of double variable: %e\n", dblast);
return 0;
}
I don't understand why author added 1111e28 at fltest variable ?

The loop terminates when fltest reaches +Inf, as at that point fl = (fl + fltest) - fltest becomes NaN, which is unequal to 0.0. last contains a value which when added to 1111e28 produces +Inf and so is close to the upper limit of float.
1111e28 is chosen to reach +Inf reasonably quickly; it also needs to be large enough that when added to large values the loop continues to progress i.e. it is at least as large as the gap between the largest and second-largest non-infinite float values.

OP: ... why author added 1111e28 at fltest variable ?
A: [Edit] For code to work using float, 1111e28, or 1.111e31 this delta value needs careful selection. It should be big enough such that if fltest was FLT_MAX, the sum of fltest + delta would overflow and become float.infinity. With round to nearest mode, this is FLT_MAX*FLT_EPSILON/4. On my machine:
min_delta 1.014120601e+31 1/2 step between 2nd largest and FLT_MAX
FLT_MAX 3.402823466e+38
FLT_EPSILON 8.388608000e+06
FLT_MAX*FLT_EPSILON 4.056481679e+31
delta needs to be small enough so if f1test is the 2nd largest number, adding delta, would not sum right up to float.infinity and skip FLT_MAX. This is 3x min_delta
max_delta 3.042361441e+31
So 1.014120601e+31 <= 1111e28 < 3.042361441e+31.
#david.pfx Yes. 1111e28 is a cute number and it is in range.
Note: Complications occur when the math and its intermediate values, even though the variables are float may calcuate at higher precsison like double. This is allowed in C and control by FLT_EVAL_METHOD or very careful coding.
1111e28 is a curious value that makes sense if the author all ready knew the general range ofFLT_MAX.
The below code is expected to loop many times (24946069 on one test platform). Hopefully, the value fltest eventually becomes "infinite". Then f1 will becomes NaN as the difference of Infinity - Infinity. The the while loop ends as Nan != 0.0. #ecatmur
while (fl == 0.0) {
last = fltest;
fltest = fltest + 1111e28;
fl = (fl + fltest) - fltest;
}
The looping, if done in small enough increments, will arrive at a precise answer. Prior knowledge of FLT_MAX and FLT_EPSILON are needed to insure this.
The problem with this is that C does not define the range FLT_MAX and DBL_MAX other than they must be at least 1E+37. So if the maximum value was quite large, the increment value of 1111e28 or 1111e297 would have no effect. Example: dbltest = dbltest + 1111e297;, for dbltest = 1e400 would certainly not increase 1e400 unless dbltest a hundred decimal digits of precision.
If DBL_MAX was smaller than 1111e297, the method fails too. Note: On simple platforms in 2014, it is not surprising to find double and float to be the same 4-byte IEEE binary32 ) The first time though the loop, dbltest becomes infinity and the loop stops, reporting "Maximum range of double variable: 0.000000e+00".
There are many ways to efficiently derive the maximum float point value. A sample follows that uses a random initial value to help show its resilience to potential variant FLT_MAX.
float float_max(void) {
float nextx = 1.0 + rand()/RAND_MAX;
float x;
do {
x = nextx;
nextx *= 2;
} while (!isinf(nextx));
float delta = x;
do {
nextx = x + delta/2;
if (!isinf(nextx)) {
x = nextx;
}
delta /= 2;
} while (delta >= 1.0);
return x;
}
isinf() is a new-ish C function. Simple enough to roll your own if needed.
In re: #didierc comment
[Edit]
The precision of a float and double is implied with "epsilon": "the difference between 1 and the least value greater than 1 that is representable in the
given floating point type ...". The maximum values follow
FLT_EPSILON 1E-5
DBL_EPSILON 1E-9
Per #Pascal Cuoq comment. "... 1111e28 being chosen larger than FLT_MAX*FLT_EPSILON.", 1111e28 needs to be at least FLT_MAX*FLT_EPSILON to impact the loop's addition, yet small enough to precisely reach the number before infinity. Again, prior knowledge of FLT_MAX and FLT_EPSILON are needed to make this determination. If these values are known ahead of time, then the code simple could have been:
printf("Maximum range of float variable: %e\n", FLT_MAX);

The largest value representable in a float is 3.40282e+38. The constant 1111e28 is chosen such that adding that constant to a number in the range of 10^38 still produces a different floating point value, so that the value of fltest will continue to increase as the function runs. It needs to be large enough that it will still be significant at the 10^38 range, and small enough that the result will be accurate.

WAV-file analysis C (libsndfile, fftw3)

I'm trying to develop a simple C application that can give a value from 0-100 at a certain frequency range at a given timestamp in a WAV-file.
Example: I have frequency range of 44.1kHz (typical MP3 file) and I want to split that range into n amount of ranges (starting from 0). I then need to get the amplitude of each range, being from 0 to 100.
What I've managed so far:
Using libsndfile I'm now able to read the data of a WAV-file.
infile = sf_open(argv [1], SFM_READ, &sfinfo);
float samples[sfinfo.frames];
sf_read_float(infile, samples, 1);
However, my understanding of FFT is rather limited. But I know it's required inorder to get the amplitudes at the ranges I need. But how do I move on from here? I found the library FFTW-3, which seems to be suited for the purpose.
I found some help here: https://stackoverflow.com/a/4371627/1141483
and looked at the FFTW tutorial here: http://www.fftw.org/fftw2_doc/fftw_2.html
But as I'm unsure about the behaviour of the FFTW, I don't know to progress from here.
And another question, assuming you use libsndfile: If you force the reading to be single channeled (with a stereo file) and then read the samples. Will you then actually only be reading half of the samples of the total file? As half of them being from channel 1, or does automaticly filter those out?
Thanks a ton for your help.
EDIT: My code can be seen here:
double blackman_harris(int n, int N){
double a0, a1, a2, a3, seg1, seg2, seg3, w_n;
a0 = 0.35875;
a1 = 0.48829;
a2 = 0.14128;
a3 = 0.01168;
seg1 = a1 * (double) cos( ((double) 2 * (double) M_PI * (double) n) / ((double) N - (double) 1) );
seg2 = a2 * (double) cos( ((double) 4 * (double) M_PI * (double) n) / ((double) N - (double) 1) );
seg3 = a3 * (double) cos( ((double) 6 * (double) M_PI * (double) n) / ((double) N - (double) 1) );
w_n = a0 - seg1 + seg2 - seg3;
return w_n;
}
int main (int argc, char * argv [])
{ char *infilename ;
SNDFILE *infile = NULL ;
FILE *outfile = NULL ;
SF_INFO sfinfo ;
infile = sf_open(argv [1], SFM_READ, &sfinfo);
int N = pow(2, 10);
fftw_complex results[N/2 +1];
double samples[N];
sf_read_double(infile, samples, 1);
double normalizer;
int k;
for(k = 0; k < N;k++){
if(k == 0){
normalizer = blackman_harris(k, N);
} else {
normalizer = blackman_harris(k, N);
}
}
normalizer = normalizer * (double) N/2;
fftw_plan p = fftw_plan_dft_r2c_1d(N, samples, results, FFTW_ESTIMATE);
fftw_execute(p);
int i;
for(i = 0; i < N/2 +1; i++){
double value = ((double) sqrtf(creal(results[i])*creal(results[i])+cimag(results[i])*cimag(results[i]))/normalizer);
printf("%f\n", value);
}
sf_close (infile) ;
return 0 ;
} /* main */

Well it all depends on the frequency range you're after. An FFT works by taking 2^n samples and providing you with 2^(n-1) real and imaginary numbers. I have to admit I'm quite hazy on what exactly these values represent (I've got a friend who has promised to go through it all with me in lieu of a loan I made him when he had financial issues ;)) other than an angle around a circle. Effectively they provide you with an arccos of the angle parameter for a sine and cosine for each frequency bin from which the original 2^n samples can be, perfectly, reconstructed.
Anyway this has the huge advantage that you can calculate magnitude by taking the euclidean distance of the real and imaginary parts (sqrtf( (real * real) + (imag * imag) )). This provides you with an unnormalised distance value. This value can then be used to build a magnitude for each frequency band.
So lets take an order 10 FFT (2^10). You input 1024 samples. You FFT those samples and you get 512 imaginary and real values back (the particular ordering of those values depends on the FFT algorithm you use). So this means that for a 44.1Khz audio file each bin represents 44100/512 Hz or ~86Hz per bin.
One thing that should stand out from this is that if you use more samples (from whats called the time or spatial domain when dealing with multi dimensional signals such as images) you get better frequency representation (in whats called the frequency domain). However you sacrifice one for the other. This is just the way things go and you will have to live with it.
Basically you will need to tune the frequency bins and time/spatial resolution to get the data you require.
First a bit of nomenclature. The 1024 time domain samples I referred to earlier is called your window. Generally when performing this sort of process you will want to slide the window on by some amount to get the next 1024 samples you FFT. The obvious thing to do would be to take samples 0->1023, then 1024->2047, and so forth. This unfortunately doesn't give the best results. Ideally you want to overlap the windows to some degree so that you get a smoother frequency change over time. Most commonly people slide the window on by half a window size. ie your first window will be 0->1023 the second 512->1535 and so on and so forth.
Now this then brings up one further problem. While this information provides for perfect inverse FFT signal reconstruction it leaves you with a problem that frequencies leak into surround bins to some extent. To solve this issue some mathematicians (far more intelligent than me) came up with the concept of a window function. The window function provides for far better frequency isolation in the frequency domain though leads to a loss of information in the time domain (ie its impossible to perfectly re-construct the signal after you have used a window function, AFAIK).
Now there are various types of window function ranging from the rectangular window (effectively doing nothing to the signal) to various functions that provide far better frequency isolation (though some may also kill surrounding frequencies that may be of interest to you!!). There is, alas, no one size fits all but I'm a big fan (for spectrograms) of the blackmann-harris window function. I think it gives the best looking results!
However as I mentioned earlier the FFT provides you with an unnormalised spectrum. To normalise the spectrum (after the euclidean distance calculation) you need to divide all the values by a normalisation factor (I go into more detail here).
this normalisation will provide you with a value between 0 and 1. So you could easily multiple this value by 100 to get your 0 to 100 scale.
This, however, is not where it ends. The spectrum you get from this is rather unsatisfying. This is because you are looking at the magnitude using a linear scale. Unfortunately the human ear hears using a logarithmic scale. This rather causes issues with how a spectrogram/spectrum looks.
To get round this you need to convert these 0 to 1 values (I'll call it 'x') to the decibel scale. The standard transformation is 20.0f * log10f( x ). This will then provide you a value whereby 1 has converted to 0 and 0 has converted to -infinity. your magnitudes are now in the appropriate logarithmic scale. However its not always that helpful.
At this point you need to look into the original sample bit depth. At 16-bit sampling you get a value that is between 32767 and -32768. This means your dynamic range is fabsf( 20.0f * log10f( 1.0f / 65536.0f ) ) or ~96.33dB. So now we have this value.
Take the values we've got from the dB calculation above. Add this -96.33 value to it. Obviously the maximum amplitude (0) is now 96.33. Now didivde by that same value and you nowhave a value ranging from -infinity to 1.0f. Clamp the lower end to 0 and you now have a range from 0 to 1 and multiply that by 100 and you have your final 0 to 100 range.
And that is much more of a monster post than I had originally intended but should give you a good grounding in how to generate a good spectrum/spectrogram for an input signal.
and breathe
Further reading (for people other than the original poster who has already found it):
Converting an FFT to a spectogram
Edit: As an aside I found kiss FFT far easier to use, my code to perform a forward fft is as follows:
CFFT::CFFT( unsigned int fftOrder ) :
BaseFFT( fftOrder )
{
mFFTSetupFwd = kiss_fftr_alloc( 1 << fftOrder, 0, NULL, NULL );
}
bool CFFT::ForwardFFT( std::complex< float >* pOut, const float* pIn, unsigned int num )
{
kiss_fftr( mFFTSetupFwd, pIn, (kiss_fft_cpx*)pOut );
return true;
}

Finding the sample at the beginning of a period of a compound periodic signal

I have a signal made up of the sum of a number of sine waves. These are spaced at 100Hz, with the lowest component frequency at 200Hz (200Hz, 300Hz...etc.) All component sine waves begin at the same point with phase = 0. In my DSP software, where I am going to multiply this signal by several other signals, I need to find a point at which all of the original signal's component signals are all again at phase = 0.
If I were only using one sine wave, I could simply look for a change in sign from negative to positive. However, if the signal has, say, components at 200Hz and 300Hz, there are three zero-crossing where the sign changes from negative to positive, but only one that represents the beginning of the period, and this increases with more component waves. I do have control over the amplitudes of each component frequency during an initial startup sequence. If these waves were strictly harmonic (200Hz, 400Hz, 800Hz, etc.), I could simply remove all but the lowest frequency, find the beginning of its period, and use this as my zero-sample. However, I don't have this bandwidth. Can anyone provide an alternative approach?
Edit:
(I have clarified and integrated this edit into body of question.)
Edit 2:
This graphic should demonstrate the issue. The frequencies two components here are n and 3n/2. Without filtering out all but the lowest frequency, or taking an FFT as proposed by #hotpaw, an algorithm that only looks for zero-crossings where the sign changes from negative positive will land on one of three, and I must find the first of those three (this is the one point at which each component signal is at phase = 0). I realise that taking an FFT will work, but I'm dealing with very limited processing power and wondering if there's a simpler approach.

Look at the derivative of the signal!
Your signal is a sum of sines (sorry, I'm not sure how to format formulas properly)
S = sum(a_n * sin(k_n * t)) ... over all n
a_n is the positive amplitude and k_n the positive frequency. The derivative (that you can compute easily numerically) of the signal is
dS/dt = sum(a_n * k_n * cos(k_n * t)) ... over all n
At t=0 (what you're looking for), the derivative has its maximum since all cosine terms are one at the same time.
Some addition:
For the practical implementation you need to consider that the derivative may be noisy, so some kind of simple first-order filtering could be necessary.

I assume that all the sine waves are exact harmonics of some fundamental frequency, all have a phase of zero with respect to the same reference point at some point in time, and that this is the point in time you wish to find.
You can use an FFT with an aperture length that is an exact multiple of the period of your fundamental frequency (100 Hz). If there is zero noise, you can use 1 period. Estimate the phase with respect to some reference point (FFT aperture start or center) of all the sinusoids using the FFT. Then use the phase of the lowest frequency sinusoid that shows up as significant in the FFT to calculate all its zero crossings in your target time range. Compare with the nearest zero crossing of all the other sinusoids (using the FFT phase to estimate their phases), and find the low frequency zero crossing with the total least squared error of offsets from all the nearest zero crossings of all the other frequencies.
You can go back to the time domain to confirm the least squares estimated crossing as an actual zero crossing and/or to remove some of the numerical noise.

I would go for a first or second order lowpass filter to remove the component frequencies. The difference between 100 Hz and the "noise" makes quite a wide gap. Start with a low frequency that cancels all noise and increase until you are satisfied with the signal.
After that you have your signal and can watch for the sign change.
Second order implementation:
static float a1 = 0;
static float a2 = 0;
static float b1 = 0;
static float b2 = 0;
static float y = 0;
static float y_old = 0;
static float u_old = 0;
void
init_lp_filter(float cutoff_freq, float sample_time)
{
float wc = cutoff_freq;
float h = sample_time;
float epsilon = 1.0f/sqrt(2.0f);
float omega = wc * sqrt(0.5f);
float alpha = exp(-epsilon*wc*h);
float beta = cos(omega*h);
float gamma = sin(omega*h);
b1 = 1.0f - alpha * (beta + epsilon * wc * gamma / omega);
b2 = alpha * alpha + alpha * (epsilon * wc * gamma / omega - beta);
a1 = -2.0f * alpha * beta;
a2 = alpha * alpha;
}
float
getOutput() {
return y;
}
void
update_filter(float input)
{
float tmp = y;
y = b1 * input + b2 * u_old - a1 * y - a2 * y_old;
y_old = tmp;
u_old = input;
}
As the filtered output depends only on old values, this means that the filtered output can be used direct at the beginning of a cycle. The filter can then be updated at the end of the periodic cycle with a sample of measurement. Do note that if you have any output that may affect the signal (i.e. actuators on a physical process), you must sample the signal before any output.)
Good luck!