Gnuplot fit function - fitting to q-exponential - c

I have a run an experiment using my C program. I have been using GNU plot to plot histograms/graphs to analyse the data.
The code below takes the data in my file and creates a file called 'tableavalanchesizeGSA' that contains the information that it would have used to plot a histogram for my data - i.e. my data in this table form is binned and the frequency of each bin. Then I take the log of the frequency and plot it against the binned data. (Simply put its just the log of the frequency vs the binned original data).
#Gnuplot commands for avalanche size GSA Log plot (axes as Log of freq/totaltrials):
reset
set xlabel 'Avalanche size'
set ylabel 'Log of Frequency'
set title "Avalanche Size with GSA"
set table 'tableavalanchesizeGSA'
#bw is the binwidth for the histogram
bw = 50.0
bin(x,s)=s*int(x/s)
plot 'avalanche_size_GSA_n_trials_2048000.dat' using (bin($1,bw)+bw/2.0):(1.0/2048000) smooth frequency with points
unset table
set logscale y
plot 'tableavalanchesizeGSA' with points title 'Frequency of Avalanche size with 2048000 trials using 1.0/2048000'
Now I am trying to fit my data to the following function:
Q(x)=(1+(1-q)*((s+x)/m))**(1/(1-q))
where q, s and m are my parameters. I have played around with by plotting my log plot and this function on the same plot for a little bit and know that q = 1.16, m = s = 100 are good values/that somewhat fit the data but not exactly. So I add the following to my code:
q = 1.16
s = 100
m = 100
Q(x)=(1+(1-q)*((s+x)/m))**(1/(1-q))
fit Q(x) 'tableavalanchesizeGSA' via s, m, q
to try and fit the data to the function using 'close' parameter values. But once the iterations are done it gives me q = 1.16116 and s = m = 100 still, which doesn't really give anything different to what I had before with q = 1.6.
Is there something wrong here? Why does the fit function not find a closer fit?
The following image shows the function (green) fitted to my data. But I would still like a more accurate fit.

I guess by "closer fit" you mean that the values for lower avalanche sizes (where you have higher frequencies) should be closer to the fitted function. You may want to add weights to your fit, as the values for higher frequencies should have a higher fidelity. Consider the code
q = 1.16
s = 100
m = 100
Q(x)=(1+(1-q)*((s+x)/m))**(1/(1-q))
fit Q(x) 'tableavalanchesizeGSA' using 1:2:(1/sqrt($2)) via s, m, q
Here, the third column of the using statement is considered as the standard deviation of the value in the second column (see gnuplot documentation). This should give you a better fit, but you may need to use the correct standard deviations.
See also the gnuplot fit demos for examples with weighting.

Related

What is the correct method to upsample?

I have an array of samples at 75 Hz, and I want to store them at 128 Hz. If it was 64 Hz and 128 Hz it was very simple, I would just double all samples. But what is the correct way if the samplerates are not a fraction of eachother?
When you want to avoid Filtering then you can:
handle signal as set of joined interpolation cubics curves
but this point is the same as if you use linear interpolation. Without knowing something more about your signal and purpose you can not construct valid coefficients (without damaging signal accuracy) for example of how to construct such cubic look here:
my interpolation cubic
in bullet #3 inside that link are coefficients I use. I think there are sufficient even for your purpose so you can try them. If you want to do custom interpolation look here:
how to construct custom interpolation curve
create function that can return point in your signal given time from start of sampling
so do something like
double signal(double time);
where time is time in [s] from start of sampling. Inside this function compute which 4 samples you need to access.
ix = floor(time*75.0);
gives you curve start point sample index. Cubic need 4 points one before curve and one after ... so for interpolation cubic points p0,p1,p2,p3 use samples ix-1,ix,ix+1,ix+2. Compute cubic coefficients a0,a1,a2,a3 and compute cubic curve parameter t (I use range <0,1>) so
t=(time*75.0); t-=floor(t);
green - actual curve segment
aqua - actual curve segment control points = 75.0 Hz samples
red - curve parametric interpolation parameter t
gray - actual time
sorry I forgot to draw the actual output signal point it should be the intersection of green and gray
simply do for loop through sampled data with time step 1/128 s
something like this:
double time,duration=samples*75.0,dt=1.0/128.0;
double signal128[???];
for (time=0.0,i=0;time<duration;i++,time+=dt)
signal128[i]=signal(time);
samples are the input signal array size in samples sampled by 75.0 Hz
[notes]
for/duration can be done on integers ...
change signal data type to what you need
inside signal(time) you need to handle edge cases (start and end of signal)
because you have no defined points in signal before first sample and after last sample. You can duplicate them or mirror next point (mirror is better).
this whole thing can be changed to process continuously without buffers just need to remember 4 last points in signal so you can do this in RT. Of coarse you will be delayed by 2-3 75.0 Hz samples ... and when you put all this together you will see that this is a FIR filter anyway :)
if you need to preserve more then first derivation add more points ...
You do not need to upsample and then downsample.
Instead, one can interpolate all the new sample points, at the desired spacing in time, using a wide enough low-pass interpolation kernel, such as a windowed Sinc function. This is often done by using a pre-calculated polyphase filter bank, either directly, or with an addition linear interpolation of the filter table. But if performance is not critical, then one can directly calculate each coefficient for each interpolated point.
The easiest way is to upsample to a sample rate which is the LCM of your two sample rates and then downsample - that way you get integer upsample/downsample ratios. In your case there are no common factors in the two sample rates so you would need to upsample by a factor of 128 to 9.6 kHz and then downsample by a factor of 75 to 128 Hz. For the upsampling you insert 127 0 samples in between each sample, then apply a suitable filter (37 Hz LPF, Fs = 9.6 kHz), and then downsample by taking every 75th sample. The filter design is the only tricky part, but there are online tools for taking the hard work out of this.
Alternatively look at third-party libraries which handle resampling, e.g. sox.
You need to upsample and downsample with an intermediate sampling frequency, as #Paul mentioned. In addition, it is needed to filter the signal after each transformation, which can be achieved by linear interpolation as:
% Parameters
F = 2;
Fs1 = 75;
Fs3 = 128;
Fs2 = lcm(Fs1,Fs3);
% Original signal
t1 = 0:1/Fs1:1;
y1 = sin(2*pi*F*t1);
% Up-sampled signal
t2 = 0:1/Fs2:1;
y2 = interp1(t1,y1,t2);
% Down-sampled signal
t3 = 0:1/Fs3:1;
y3 = interp1(t2,y2,t3);
figure;
subplot(3,1,1);
plot(t1,y1,'b*-');
title(['Signal with sampling frequency of ', num2str(Fs1), 'Hz']);
subplot(3,1,2);
plot(t2,y2,'b*-');
title(['Signal with sampling frequency of ', num2str(Fs2), 'Hz']);
subplot(3,1,3);
plot(t3,y3,'b*-');
title(['Signal with sampling frequency of ', num2str(Fs3), 'Hz']);

Plotting a frequency response from biquad filter

This is a hard one and although I can think of a few kludge methods of doing it, I have a feeling there is a clean mathematical method, although I am having difficulty inventing it myself.
I have a number of parameters which control (software) biquad filters for audio. Essentially there are just 3 parameters, frequency, gain and Q (or bandwidth). In audio terms, the frequency represents the center frequency of the filter. The gain represents whether this frequency is boosted or cut (a gain of 0 results in no change to the audio passing through the filter). Q represents the width of the filter - IE a very wide filter might affect frequencies far away from the center frequency, whereas a narrow (low Q) filter will only affect frequencies close to the center frequency.The filters take the form of a bell curve, or at least thats an approximation, whether its mathematically accurate I am not sure.
I want to display the characteristics of these filters graphically - display a graph of gain against frequency. There are several of these filters applied to the audio channel, and I want to be able to add the different result graphs, to produce an overall graph (IE a graph summing all the components of the combined filters). But I also want to be able to access the individual filters graphs.
I can handle adding the component graphs into a single 'total' graph, but how to produce the original x-y graph from the filter parameters escapes me. I will draw bitmaps so all I need is to be able to create arrays of the form frequency[x]=y. Im doing this in C so I don't have the mathematical tools in matlab etc. So I might have a filter with a center frequency of say 1000 (Hz), a gain of say 20 (db or linear I understand how to convert that), and a Q of say 3. The Q factor is relative and does not have to be exactly mathematically correct if that causes any complication.
It seems like a quite simple mathematical function but maths is not my strong point and I don't know enough - I have been messing round with sine functions etc but its not working and I suspect is probably wasting processing power by over complicating the maths (although I might be wrong there).
TIA, Pete
I have my doubts about the relationships between biquad filters, Q values, and bell curves. But I'll put those aside and just tell you how to draw a bell curve, since that's what you asked.
From this wikipedia article, the equation for a bell curve is
where for your application
a corresponds to the gain
b determines the center frequency
2c^2 is related to Q (larger values will make the curve wider)
The C code below computes a sample bell curve. For this example, the numbers were chosen based on drawing into a window that is 250 pixels wide by 200 pixels high, with a coordinate system where the origin {0,0} is at the bottom left corner.
int width = 250;
int height = 200;
int bellCurve[width]; // the output array that holds the f(x) values
double gain = 180; // the 'a' value, determines how high the peak is from the baseline
double offset = 10; // the 'd' value, determines the y coordinate of the baseline
double qFactor = 1000; // the '2c^2' value, determines how fat the curve is
double center = 100; // the 'b' value, determines the x coordinate of the peak
double dx;
for ( int x = 0; x < width; x++ )
{
dx = x - center;
bellCurve[x] = gain * exp( -( dx * dx ) / qFactor ) + offset;
}
Plotting the curve results in an image like this where the peak is at x=100, y=10+180=190
You could input a unit impulse (an array of all zeros, except one element=1.0) into your digital filters, treating them as black boxes. Then FFT the impulse response output array to get the frequency response of the filter. Plotting the magnitude of the complex frequency samples will give you a pretty picture. Python+numpy+matplotlib would probably be an easier way to go about it. You will need to know the sampling period to get meaningful plots.
What you really want is the bode plot of the filter. This is non trivial to calculate yourself, a cursory search for a library to do it for you in C yielded nothing. If accuracy is not important and you can approximate the shape once and stretch it based on the parameters of the particular filter. For example, you might have a normalized array of relative values and construct a new curve (array) based on the parameters of the filter and the base curve you generated earlier.
The base curve could be generated from MATLAB if you can or Wolfram Alpha or something like that.
Here is one in javascript.
http://www.earlevel.com/main/2013/10/13/biquad-calculator-v2/
The filter you describe is the 'peak' filter. Use the log scale to display frequencies.
—Tom

How to resample an array of n elements into an array of m elements

I have an array of N measurements that I should present as a graph, but the graph can be only of M pixels wide and is scrolled only by M pixels.
While M is constant, N can be anything from tens to thousands.
Each time when I need to show the graph, I know what N is, however as N/M can be not integer, there is an accumulated error that I want somehow to compensate.
I am working in a plain C, and no math libraries can be used.
EDIT 2:
The data is relatively homogeneous with peaks once in a while, and I do not want to miss these peaks, while interpolating.
EDIT 3:
I am looking for solution that will work good enough for any N, greater than M and lesser than M.
Thanks.
One good solution is not to iterate over your input samples, but over your output positions. That is, you will always draw exactly M pixels. To calculate the nearest sample value for the ith pixel, use the array offset:
[(i*N+M/2)/M]
Of course only using the nearest sample will give a very aliased result (discarding most of your samples in the case where N is large). If you're sure N will always be larger than M, a good but simple approach is to average sufficient neighboring samples with a weighted average such that each sample gets a total weight of 1 (with endpoints having their weight split between neighboring output pixels). Of course there are more elaborate resampling algorithms you can use that may be more appropriate (especially if your data is something like audio samples which are more meaningful in the frequency domain), but for an embedded device with tight memory and clock cycle requirements, averaging is likely the approach you want.

Increasing the pitch of audio using a varied value

Okay, this a bit of maths and DSP question.
Let us say I have 20,000 samples which I want to resample at a different pitch. Twice the normal rate for example. Using an Interpolate cubic method found here I would set my new array index values by multiplying the i variable in an iteration by the new pitch (in this case 2.0). This would also set my new array of samples to total 10,000. As the interpolation is going double the speed it only needs half the amount of time to finish.
But what if I want my pitch to vary throughout the recording? Basically I would like it to slowly increase from a normal rate to 8 times faster (at the 10,000 sample mark) and then back to 1.0. It would be an arc. My questions are this:
How do I calculate how many samples would the final audio track be?
How to create an array of pitch values that would represent this increase from 1.0 to 8.0 back to 1.0
Mind you this is not for live audio output, but for transforming recorded sound. I mainly work in C, but I don't know if that is relevant.
I know this probably is complicated, so please feel free to ask for clarifications.
To represent an increase from 1.0 to 8.0 and back, you could use a function of this form:
f(x) = 1 + 7/2*(1 - cos(2*pi*x/y))
Where y is the number of samples in the resulting track.
It will start at 1 for x=0, increase to 8 for x=y/2, then decrease back to 1 for x=y.
Here's what it looks like for y=10:
Now we need to find the value of y depending on z, the original number of samples (20,000 in this case but let's be general). For this we solve integral 1+7/2 (1-cos(2 pi x/y)) dx from 0 to y = z. The solution is y = 2*z/9 = z/4.5, nice and simple :)
Therefore, for an input with 20,000 samples, you'll get 4,444 samples in the output.
Finally, instead of multiplying the output index by the pitch value, you can access the original samples like this: output[i] = input[g(i)], where g is the integral of the above function f:
g(x) = (9*x)/2-(7*y*sin((2*pi*x)/y))/(4*pi)
For y=4444, it looks like this:
In order not to end up with aliasing in the result, you will also need to low pass filter before or during interpolation using either a filter with a variable transition frequency lower than half the local sample rate, or with a fixed cutoff frequency more than 16X lower than the current sample rate (for an 8X peak pitch increase). This will require a more sophisticated interpolator than a cubic spline. For best results, you might want to try a variable width windowed sinc kernel interpolator.

Spatial search with ravenDB

I have a rather specific spatial search I need to do. Basically, have an object (lets call it obj1) with two locations, lets call them point A and point B.
I then have a collection of objects(lets call each one obj2) each with their own A and B locations.
I want to return the top 10 objects from the collection sorted by:
(distance from obj1 A to obj2A) + (the distance from obj1B to obj2B)
Any ideas?
Thanks,
Nick
Update:
Here's a little more detail on the documents and how I want to compare them.
The domain model:
Listing:
ListingId int
Title string
Price double
Origin Location
Destination Location
Location:
Post / Zipcode string
Latitude decimal
Longitude decimal
What i want to do is take a listing object (not in the database) and compare it with the collection of listings in the database. I want the query to return the top 12 (or x) number of listings sorted by the crow flies distance from the origins plus the crow flies distance from destinations.
I don't care about the distance from origin to destination - only about the distance of origin to origin plus destination to destination.
Basically Im trying to find listings where the starting and ending locations are close.
Please let me know if I can clarify more.
Thanks!
Here is how one would solve such a problem in
mysql 4.1 &
mysql 5.
The link from mysql 4.1 seems quite helpful, esp. the first example, it's pretty much what you are asking about.
But if this is not quite helpful, I guess you'd have to loop and do queries either on obj1 or obj2 against its counterpart table.
From algorithmic perspective, I'd find the center of the bounding box, then picked candidates with increasing radius while I find enough.
Also I just want to remind that crow fly distance over the globe is not Pythagoras distance and different formula must be used:
public static double GetDistance(double lat1, double lng1, double lat2, double lng2)
{
double deltaLat = DegreesToRadians(lat2 - lat1);
double deltaLong = DegreesToRadians(lng2 - lng1);
double a = Math.Pow(Math.Sin(deltaLat / 2), 2) +
Math.Cos(DegreesToRadians(lat1))
* Math.Cos(DegreesToRadians(lat2))
* Math.Pow(Math.Sin(deltaLong / 2), 2);
return earthMeanRadiusMiles * (2 * Math.Atan2(Math.Sqrt(a), Math.Sqrt(1 - a)));
}
Sounds like you're building a rideshare website. :)
The bottom line is that in order to sort your query result by surface distance, you'll need spatial indexing built into the database engine. I think your options here are MySQL with OpenGIS extensions (already mentioned) or PostgreSQL with PostGIS. It looks like it's possible in ravenDB too: http://ravendb.net/documentation/indexes/sptial
But if that's not an option, there's a few other ways. Let's simplify the problem and say you just want to sort your database records by their distance to location A, since you're just doing that twice and summing the result.
The simplest solution is to pull every record from the database and calculate the distance to location A one by one, then sort, in code. Trouble is, you end up doing a lot of redundant computations and pulling down the entire table for every query.
Let's once again simplify and pretend we only care about the Chebyshev (maximum) distance. This will work for narrowing our scope within the db before we get more accurate. We can do a "binary search" for nearby records. We must decide an approximate number of closest records to return; let's say 10. Then we query inside of a square area, let's say 1 degree latitude by 1 degree longitude (that's about 60x60 miles) around the location of interest. Let's say our location of interest is lat,lng=43.5,86.5. Then our db query is SELECT COUNT(*) FROM locations WHERE (lat > 43 AND lat < 44) AND (lng > 86 AND lng < 87). If you have indexes on the lat/lng fields, that should be a fast query.
Our goal is to get just above 10 total results inside the box. Here's where the "binary search" comes in. If we only got 5 results, we double the box area and search again. If we got 100 results, we cut the area in half and search again. If we get 3 results immediately after that, we increase the box area by 50% (instead of 100%) and try again, proceeding until we get close enough to our 10 result target.
Finally we take this manageable set of records and calculate their euclidean distance from the location of interest, and sort, in code.
Good luck!
I do not think that you find a solution directly out of the box.
It'll be much more efficient if you use a bounding sphere instead of a bounding box to specify your object.
http://en.wikipedia.org/wiki/Bounding_sphere
C = ( A + B)/2 and R = distance(A,B) /2
You do not precise how much data you want to compare. And if you want to see the closests or the farthest objects pair.
For both case, I think that you have to encode C coordinate as a path in an octtree if you are using 3D or quadtree if you are using 2D.
http://en.wikipedia.org/wiki/Quadtree
This is a first draft I can add more information if this not enough.
If you are not familiar with 3D start with 2D it easier to start with.
I show your latest add, it seems that your problem is very similar to clash detection algorithm.
I think that if you change the coordinate system of the "end-point" by polar coordinate relative to the "start-point". If you round the radial coordinate to your tolerance (x miles), and order them by this value.

Resources