Solr query - float range with boosting - solr

What is the proper way to implement following thing:
Task: query documents where voltage (float) equals 100.0 and tolerance -0% / +20%.
q=+voltage_f[100.0 TO 120.0]
I want that documents with the voltage near to lower bound (100.0) get more points as the documents with the voltage near upper bound (120.0).
Or vice versa - tolerance -20% / +0%.
q=+voltage_f[80.0 TO 100.0]
I want that documents with the voltage near to upper bound (100.0) get more points as the documents with the voltage near lower bound (80.0).

You can use the recip function to get a value that starts at 1 and then tapers off as the value increases:
Performs a reciprocal function with recip(x,m,a,b) implementing a/(m*x+b) where m,a,b are constants, and x is any arbitrarily complex function.
When a and b are equal, and x>=0, this function has a maximum value of 1 that drops as x increases. Increasing the value of a and b together results in a movement of the entire function to a flatter part of the curve. These properties can make this an ideal function for boosting more recent documents when x is rord(datefield).
For your first case, that could be implemented by using recip directly, and adjusting the a and b values to suit your needs.
For the second case you can use abs(sub(100, x)) as x, as that becomes larger as the value is further away from 100.
The recip call can be added in bf (for dismax, edismax) or in boost (edismax).
Your frontend will have to decide the proper values to use for a, b or m - you can use debugQuery=true to see how much the boost contributes and adjust accordingly.

Related

solr ranking using an existent field

I'm loading into Solr data from a mysql database with DataImportHandler. Every document contains a popularity field (int type) that is calculated from another application and saved into mysql (this field is based on some rules relatives to the domain of application).
How can i use this value to improve solr ranking? Would be correct to sum the solr score with popularity value?
How bf can be used here?
A good starting point that'd probably work is multiplying the score by a sublinear function that increases (slowly) with popularity. For example,
newScore = score * log(1 + 0.5 * popularity)
To apply this boost you should use Solr's EDisMax query parser and pass the boost parameter with the following value:
&boost=log(sum(1, product(0.5, popularity)))
where popularity is the name of the field. You don't need to use the bf parameter since you should use a multiplicative boost, not an additive one.
The reason for adding 1 is to handle the case in which popularity=0 (so if each document's popularity is always at least 1, you don't need to add 1). The strength of the popularity effect can be increased or decreased by changing the 0.5 factor to some other value. For example, you can use a factor of 2 to increase the effect:
newScore = score * log(1 + 2 * popularity)
A good factor is probably around 9 / m where m is what you expect should be the median popularity, since in this case the boost of a "median document" (median in the sense that its popularity equals m) is going to be 1 (that is, its score won't be boosted at all).
Again, this is just a starting point and you'll have to try out different boosting functions until you find one that performs well.

Matlab's bvp4c: output arrays not always the same length as the initial guess

The Matlab function bvp4c solves boundary value problems. It takes a differential equation, boundary conditions and an initial guess as input, and returns a structure array containing arrays of x, y and yp (which stands for "y prime", or y').
The length of the output arrays should be the same as that of the initial guess, but I found that it isn't always. I have checked the dimensions of the input (the initial guess, always 1x101 double for x and 16x101 double for y) and the output (sometimes 1x101 double for x and 16x101 double for y and yp as it should be, but often different values, such as 1x91 double and 16x91 double or 1x175 double and 16x175 double).
Looking at the output array x when its length is off, some extra values are squeezed in, or some are taken out. For example, the initial guess has 100 positions between x=0 and x=1, and the x array should be [0 0.01 0.02 ... 1], but sometimes a new position like 0.015 shows up.
Question: Why does this happen, and how can this be solved?
"The length of the output arrays should be the same as that of the initial guess ...." This is incorrect.
As described in the bvp4c documentation, sol.x contains a "[mesh] selected by bvp4c" with an "[approximation] to y(x) at the mesh points of sol.x". In order to evaluate bvp4c's solution on your mesh, use deval.
Why does bvp4c choose a mesh? Quoting from the cited paper1, which you can get in full here if you have a MathWorks account:
Because BVPs can have more than one solution, BVP codes require users to supply a guess for the solution desired. The guess includes a guess for an initial mesh that reveals the behavior of the desired solution. The codes then adapt the mesh so as to obtain an accurate numerical solution with a modest number of mesh points.
Because a steady BVP generally has a global behavior strongly dependent on its boundary values, the spatial mesh between the two boundaries may need to be refined in order to properly approximate the desired solution with the locally chosen basis functions for the method. However, there may also be portions of the mesh that do not need to be refined and can even be coarsened in some cases to maintain a reasonably small residual and accurate approximation. Therefore, for general efficiency, the guess mesh is adaptively refined or coarsened depending on some locally chosen metric (since bvp4c is collocation based, the metric is probably point-based or division-integrated based) such that the mesh returned by bvp4c is, in some sense, adequate enough for generic interpolation within the boundaries.
I'll also note that this is different from numerically solving IVPs since their state is not global across the entire time integration locus and only depends on the current state to the next time-step, and possibly previous time steps if using a multi-step method or solving a delay differential equation, which makes the refinement inherently local. This local behavior of IVPs is what allows functions like ode45 to return a solution at pre-selected time values because it can locally refine the solution at the selected point while performing the time march (this is known as dense output).
1 Shampine, L.F., M.W. Reichelt, and J. Kierzenka, "Solving Boundary Value Problems for Ordinary Differential Equations in MATLAB with bvp4c".

Plotting a frequency response from biquad filter

This is a hard one and although I can think of a few kludge methods of doing it, I have a feeling there is a clean mathematical method, although I am having difficulty inventing it myself.
I have a number of parameters which control (software) biquad filters for audio. Essentially there are just 3 parameters, frequency, gain and Q (or bandwidth). In audio terms, the frequency represents the center frequency of the filter. The gain represents whether this frequency is boosted or cut (a gain of 0 results in no change to the audio passing through the filter). Q represents the width of the filter - IE a very wide filter might affect frequencies far away from the center frequency, whereas a narrow (low Q) filter will only affect frequencies close to the center frequency.The filters take the form of a bell curve, or at least thats an approximation, whether its mathematically accurate I am not sure.
I want to display the characteristics of these filters graphically - display a graph of gain against frequency. There are several of these filters applied to the audio channel, and I want to be able to add the different result graphs, to produce an overall graph (IE a graph summing all the components of the combined filters). But I also want to be able to access the individual filters graphs.
I can handle adding the component graphs into a single 'total' graph, but how to produce the original x-y graph from the filter parameters escapes me. I will draw bitmaps so all I need is to be able to create arrays of the form frequency[x]=y. Im doing this in C so I don't have the mathematical tools in matlab etc. So I might have a filter with a center frequency of say 1000 (Hz), a gain of say 20 (db or linear I understand how to convert that), and a Q of say 3. The Q factor is relative and does not have to be exactly mathematically correct if that causes any complication.
It seems like a quite simple mathematical function but maths is not my strong point and I don't know enough - I have been messing round with sine functions etc but its not working and I suspect is probably wasting processing power by over complicating the maths (although I might be wrong there).
TIA, Pete
I have my doubts about the relationships between biquad filters, Q values, and bell curves. But I'll put those aside and just tell you how to draw a bell curve, since that's what you asked.
From this wikipedia article, the equation for a bell curve is
where for your application
a corresponds to the gain
b determines the center frequency
2c^2 is related to Q (larger values will make the curve wider)
The C code below computes a sample bell curve. For this example, the numbers were chosen based on drawing into a window that is 250 pixels wide by 200 pixels high, with a coordinate system where the origin {0,0} is at the bottom left corner.
int width = 250;
int height = 200;
int bellCurve[width]; // the output array that holds the f(x) values
double gain = 180; // the 'a' value, determines how high the peak is from the baseline
double offset = 10; // the 'd' value, determines the y coordinate of the baseline
double qFactor = 1000; // the '2c^2' value, determines how fat the curve is
double center = 100; // the 'b' value, determines the x coordinate of the peak
double dx;
for ( int x = 0; x < width; x++ )
{
dx = x - center;
bellCurve[x] = gain * exp( -( dx * dx ) / qFactor ) + offset;
}
Plotting the curve results in an image like this where the peak is at x=100, y=10+180=190
You could input a unit impulse (an array of all zeros, except one element=1.0) into your digital filters, treating them as black boxes. Then FFT the impulse response output array to get the frequency response of the filter. Plotting the magnitude of the complex frequency samples will give you a pretty picture. Python+numpy+matplotlib would probably be an easier way to go about it. You will need to know the sampling period to get meaningful plots.
What you really want is the bode plot of the filter. This is non trivial to calculate yourself, a cursory search for a library to do it for you in C yielded nothing. If accuracy is not important and you can approximate the shape once and stretch it based on the parameters of the particular filter. For example, you might have a normalized array of relative values and construct a new curve (array) based on the parameters of the filter and the base curve you generated earlier.
The base curve could be generated from MATLAB if you can or Wolfram Alpha or something like that.
Here is one in javascript.
http://www.earlevel.com/main/2013/10/13/biquad-calculator-v2/
The filter you describe is the 'peak' filter. Use the log scale to display frequencies.
—Tom

Boosting relevance, based on absolute numerical closeness

In my Solr scheme, I have a numeric field that stores a color value (out of, say 65535). How can I make so that when I search for a particular color, the search relevance gets boosted, depending on how close (in absolute value) the particular search is to the asked value?
you can use function queries to calculate the closeness and boost the value.
e.g. div(x,65535) which will generate a value of 1 if exact and less values depending on the closeness.
You can check for the other queries as well to factor the boost accordingly.
And boost the results q={!boost b=div(x,65535)}text:supervillians
together with the function queries, you can use the recip function for calculating boost factor from the color distance http://wiki.apache.org/solr/FunctionQuery#recip
Example:
recip(div(x,65535),1,10000,10000)

Increasing the pitch of audio using a varied value

Okay, this a bit of maths and DSP question.
Let us say I have 20,000 samples which I want to resample at a different pitch. Twice the normal rate for example. Using an Interpolate cubic method found here I would set my new array index values by multiplying the i variable in an iteration by the new pitch (in this case 2.0). This would also set my new array of samples to total 10,000. As the interpolation is going double the speed it only needs half the amount of time to finish.
But what if I want my pitch to vary throughout the recording? Basically I would like it to slowly increase from a normal rate to 8 times faster (at the 10,000 sample mark) and then back to 1.0. It would be an arc. My questions are this:
How do I calculate how many samples would the final audio track be?
How to create an array of pitch values that would represent this increase from 1.0 to 8.0 back to 1.0
Mind you this is not for live audio output, but for transforming recorded sound. I mainly work in C, but I don't know if that is relevant.
I know this probably is complicated, so please feel free to ask for clarifications.
To represent an increase from 1.0 to 8.0 and back, you could use a function of this form:
f(x) = 1 + 7/2*(1 - cos(2*pi*x/y))
Where y is the number of samples in the resulting track.
It will start at 1 for x=0, increase to 8 for x=y/2, then decrease back to 1 for x=y.
Here's what it looks like for y=10:
Now we need to find the value of y depending on z, the original number of samples (20,000 in this case but let's be general). For this we solve integral 1+7/2 (1-cos(2 pi x/y)) dx from 0 to y = z. The solution is y = 2*z/9 = z/4.5, nice and simple :)
Therefore, for an input with 20,000 samples, you'll get 4,444 samples in the output.
Finally, instead of multiplying the output index by the pitch value, you can access the original samples like this: output[i] = input[g(i)], where g is the integral of the above function f:
g(x) = (9*x)/2-(7*y*sin((2*pi*x)/y))/(4*pi)
For y=4444, it looks like this:
In order not to end up with aliasing in the result, you will also need to low pass filter before or during interpolation using either a filter with a variable transition frequency lower than half the local sample rate, or with a fixed cutoff frequency more than 16X lower than the current sample rate (for an 8X peak pitch increase). This will require a more sophisticated interpolator than a cubic spline. For best results, you might want to try a variable width windowed sinc kernel interpolator.

Resources