I'd like to know if there is a way to store a Solr function expression in an intermediate variable so it doesn't get recomputed each time.
Take for example my concrete need. I have the necessity to sort by distance (but each distance grouped in a different range set, like it would be the case with distance faceting frange). These ranges are user definable and can be of any length.
For example a user defines the range set 1,2,3, the expression sent to Solr would be:
max(map(geodist(),0,1,1),max(map(geodist(),1,2,2),map(geodist(),2,3,3))) asc
Can that geodist() call be stored/memoized, or does Solr internally optimize such expressions?
I am not sure about the following but might be worth a try :-
{!func}max(map($v3,0,1,1),$v2)&v2={!func}max($v4,$v5)&v4=map($v3,1,2,2)&v5=map($v3,2,3,3)&v3=geodist()
The above is called parameter dereferencing.
http://wiki.apache.org/solr/LocalParams
we are assigning the function geodist() to a local parameter v3. This parameter is substituted in another set of parameters :- v4 and v5 which are further substituted in v2 and the main function.
However, the above feature was introduced recently in solr 4.0
Related
How to RANK an array directly? I would like to avoid creating more intermediate data in cells just to reference them.
Excel RANK.AVG formula states it accepts both array and reference:
Syntax
RANK.AVG(number,ref,[order])
The RANK.AVG function syntax has the following arguments:
Number Required. The number whose rank you want to find.
Ref Required. **An array of, or a reference to**, a list of numbers. Nonnumeric values in Ref are ignored.
Order Optional. A number specifying how to rank number.
But Excel keeps rejecting the below formula.
=RANK.AVG(5, {3,1,7,10,5})
If the numbers are put in cells, say B1:B5, Excel accepts
=RANK.AVG(5, B1:B5}
Ultimately, I would like to rank a dynamic array
=RANK.AVG(value, TOCOL(VSTACK(array1, array2))
e.g. =RANK.AVG(5, TOCOL(VSTACK(B1:B5,C1:C10))
It seems that the official documentation on the various RANK functions is simply wrong with respect to the fact that they permit arrays for the ref argument (see here, for example).
You will have to come up with creative alternatives which mimic the RANK.AVG function, for example:
=LET(ζ,SORT(MyArray,,-1),AVERAGE(FILTER(SEQUENCE(COUNT(ζ)),ζ=MyValue)))
I would be grateful if anyone knows whether the following issue is documented and/or what the underlying reasons are.
Assuming we have, for example, the numbers from 1 to 10 in A1:A10, the following formula
=SUMPRODUCT(SUBTOTAL(4,OFFSET(A1,{0;5},0,5)))
is perfectly valid and is equivalent to taking the sum of the maximum values from each of the ranges A1:A5 and A6:A10, since the OFFSET function, here being passed an array of values ({0;5}) as its rows parameter and with the appropriate height parameter (5), resolves to the array of ranges:
{A1:A5,A6:A10}
which is then passed to SUBTOTAL to generate a further array comprising the maximum values from each of those ranges, i.e. 5 and 10, before being summed by SUMPRODUCT.
AGGREGATE was introduced in Excel 2010 as, it would seem, a more refined version of SUBTOTAL. My question is why, when attempting the following
=SUMPRODUCT(AGGREGATE(14,,OFFSET(A1,{0;5},0,5),1))
which should be equivalent to the SUBTOTAL example given above, does Excel display the message that it "Ran Out of Resources While Attempting to Calculate One or More Formulas" (and return a value of 0)?
(Note that users of a non-English-language version of Excel may require a different separator within the array constant {0;5}.)
This is a quite unexpected error. Evidently the syntax is not at fault, nor is the passing of the OFFSET construction 'disallowed'. With nothing else in the workbook, what is causing Excel to use so much resource when attempting to resolve such a construction?
A similar result occurs with INDIRECT instead of OFFSET, i.e.
=SUMPRODUCT(SUBTOTAL(4,INDIRECT({"A1:A5","A6:A10"})))
is perfectly valid, yet
=SUMPRODUCT(AGGREGATE(14,,INDIRECT({"A1:A5","A6:A10"}),1))
gives the same error described above.
Regards
[Not enough reputation to add a comment.]
Excel on Mac returns this:
Arrays containing ranges are not supported
The AGGREGATE error appears to be due to passing an array of range references to an argument that expects an array of values. The error message has symptoms of passing an unitialized pointer resulting in unexpected behavior. Indeed, the same error dialog is shown with some other functions like:
=MEDIAN(TRANSPOSE(INDIRECT({"a1:a5","a6:a10"})))
On the other hand, passing an array of references to the fourth or later argument of AGGREGATE is permitted, eg:
=SUMPRODUCT(AGGREGATE(4,,B1,INDIRECT({"a1:a5","a6:a10"})))
In a similar way, SUBTOTAL allows arrays of references in the second or later arguments, none of which natively take arrays. The SUBTOTAL formula is evaluated by applying the function to each range reference in the array, i.e.:
SUBTOTAL(4,INDIRECT({"a1:a5","a6:a10"}))
->{SUBTOTAL(4,A1:A5),SUBTOTAL(4,A6:A10)}
Formatting arrays and range references within function definitions may help with visualising the formula processing:
AGGREGATE(function_num, options, array or ref1, [k or ref2], [ref3], …)
SUBTOTAL(function_num, ref1, [ref2],...)
Note that reference only arguments also allow for arrays of references.
It will be interesting to see if there are any changes to this behavior with the updated calc engine and dynamic arrays currently in Office 365 preview and due for release soon...
I have a scenario in which external agent generates ranking function dynamically which I want to pass as a query argument instead of statically defining it in search definition file, something like
http://localhost:8080/search/?query=honda car&rankfeature.rankingExpression="query(title_match_weight)*matches(title)+query(tags_match_weight)*matches(tags)"&rankfeature.query(title_match_weight)=10&rankfeature.query(tags_match_weight)=20
which I am not able to do now. Do we have solution to achieve this in Vespa?
I have tried foreach in rank expression command to serve this purpose but it doesn't serve flexibility of having any function dynamically.
http://docs.vespa.ai/documentation/ranking.html#using-query-variables
explains about macros and I find that macros is taken as rank-feature and rank feature can be passed in the query. So that should mean macro can be passed in the query which can be used in the expression, but it is not possible.
It's not possible to send ranking expressions with the query (it wouldn't be efficient as they are (often) compiled with LLVM etc).
Couldn't you use a fixed ranking expression and use query features to weight/or turn on or off different parts of it? You can also configure many different ranking expressions and choose between them at query time using ranking.profile=profileName.
In FMI 2.0, array parameters are serialized to scalar variables.
Importing tools can display them as arrays, but their size is fixed and their handling is inefficient.
Better array support is currently in development by a working group of the FMI project, but I would like to know about workarounds how to handle array parameters in the meantime.
Ideas are to
hard code them (disadvantage: the are no paramters any more ...)
put them in a CSV file in the resources folder and read them at the start of the simulation (disadvantage: no parameter mask support, complicated)
put them in a string parameter and parse it at simulation start (disadvantage: limited length of strings, complicated)
Are there other ideas / workarounds? Thanks in advance.
Combinations of the ideas outlined in your question are also possible.
Hard code with selector parameter
Here the idea is to hard code several variants of your array and allow the user to select one with a parameter.
I did this in a recent project where a user needed to choose between different spatially resolved initial conditions (e.g. temperature profiles). We used a model to generate more than 100 different sets of spatially resolved initial conditions (each representing a different "history" of the modeled object), hard coded them as FORTRAN arrays (the inner core of the FMU was in FORTRAN), and used a single integer parameter to select which profile he wanted to use.
It worked very well and the user has no way of breaking it.
Shorten the array and interpolate
If the data in your array is smooth, you might be able to dramatically reduce the number of values you actually need to pass to your simulation - which would make serialization into scalar parameters less painful.
Within the FMU, interpolate to get the resolution you need.
String parameter to select csv file
You can use a string parameter to provide the path to a user-provided csv-file. I would not recommend this, because the user will most likely break it.
I have a vector array that contains Time values in an asceding order. With relational expressions I can obtain subset values from that array, after that I need the first value of that subset without creating new variables.
For example.
Time is an column vector, then I can use Time(something==X) to get a subset values of Time, but then I need the first value of Time(something==X), I can't use Time(something==X)(1) like some programming languages u.u
Unfortunately with MATLAB you need to use temporary variables. It doesn't support this kind of indexing, though it is quite natural and I would love if they supported it.
You would have to do this:
x = Time(something==X);
y = x(1);
Octave does have the ability of doing this kind of indexing though. The only way I can think of you escaping this is if you use cell arrays. However, if you want to use a normal vector, then you're SOL.
EDIT: May 13th, 2014 - Referencing David's comment, it is possible to do this without a temporary variable, but readability is very poor. In the end, a temporary variable is still the better way for readability and reproducibility. Check the following SO post that he has referenced:
How can I index a MATLAB array returned by a function without first assigning it to a local variable?