Passing Ranking function in query text - vespa

I have a scenario in which external agent generates ranking function dynamically which I want to pass as a query argument instead of statically defining it in search definition file, something like
http://localhost:8080/search/?query=honda car&rankfeature.rankingExpression="query(title_match_weight)*matches(title)+query(tags_match_weight)*matches(tags)"&rankfeature.query(title_match_weight)=10&rankfeature.query(tags_match_weight)=20
which I am not able to do now. Do we have solution to achieve this in Vespa?
I have tried foreach in rank expression command to serve this purpose but it doesn't serve flexibility of having any function dynamically.
http://docs.vespa.ai/documentation/ranking.html#using-query-variables
explains about macros and I find that macros is taken as rank-feature and rank feature can be passed in the query. So that should mean macro can be passed in the query which can be used in the expression, but it is not possible.

It's not possible to send ranking expressions with the query (it wouldn't be efficient as they are (often) compiled with LLVM etc).
Couldn't you use a fixed ranking expression and use query features to weight/or turn on or off different parts of it? You can also configure many different ranking expressions and choose between them at query time using ranking.profile=profileName.

Related

Array parameters in FMI 2.0

In FMI 2.0, array parameters are serialized to scalar variables.
Importing tools can display them as arrays, but their size is fixed and their handling is inefficient.
Better array support is currently in development by a working group of the FMI project, but I would like to know about workarounds how to handle array parameters in the meantime.
Ideas are to
hard code them (disadvantage: the are no paramters any more ...)
put them in a CSV file in the resources folder and read them at the start of the simulation (disadvantage: no parameter mask support, complicated)
put them in a string parameter and parse it at simulation start (disadvantage: limited length of strings, complicated)
Are there other ideas / workarounds? Thanks in advance.
Combinations of the ideas outlined in your question are also possible.
Hard code with selector parameter
Here the idea is to hard code several variants of your array and allow the user to select one with a parameter.
I did this in a recent project where a user needed to choose between different spatially resolved initial conditions (e.g. temperature profiles). We used a model to generate more than 100 different sets of spatially resolved initial conditions (each representing a different "history" of the modeled object), hard coded them as FORTRAN arrays (the inner core of the FMU was in FORTRAN), and used a single integer parameter to select which profile he wanted to use.
It worked very well and the user has no way of breaking it.
Shorten the array and interpolate
If the data in your array is smooth, you might be able to dramatically reduce the number of values you actually need to pass to your simulation - which would make serialization into scalar parameters less painful.
Within the FMU, interpolate to get the resolution you need.
String parameter to select csv file
You can use a string parameter to provide the path to a user-provided csv-file. I would not recommend this, because the user will most likely break it.

Simple loop in datastage

I have a basic understanding problem with datastage. I am new to this field. It is about the implementation of loops. First I get several rows of a select query using the connector-stage. Now I would like to do several more steps for each row. The result of each row should now be used as a variable in further stages. How can I do that? I know the loop possibility in the transformer stage, but does not seem to solve my problem.
Should i work with the loop stage in the jop sequence? If yes how?
the Problem:
foreach($selectQueryResults as row) {
// do something with the row-value
}
Thanks!
I have to admit, I've not got access to DataStage anymore nor the code I once had.
This is to propagate stage variables at a Job Level
However the routine activity to accomplish your task would be as follows.
Propagate SQL as a variable
Use DSExecute to execute a command line function (to call SQLPlus, nzsql or whichever your command line is) and pass through your SQL variable
Return it's results into another variable
With that variable you can split the contents, first by line and then by delimiter using loop / for statement.
Use the DSSetParam to map the key value pairs to your parameters of a specific job by using the DSAttachJob function, or just propagate them as outputs from the routine activity
BASIC Language Reference
DSSetParam
Remember that error handling and commenting is important within the BASIC routines otherwise
Subsequently, this is to propagate stage variables within a transformer and is a very powerful tool once it's been mastered.
Here is the documentation and examples for defining stage variables within a Transformer model and aggregating the output. Please note, the order of the stage variables is extremely important.
TransformerLoops
Within a transformer you can define incoming columns as stage variables, use those stage variables to aggregate, concatenate (strings), split, subtract... basically you can do a ridiculous amount once you get your head around it.
I would suggest going through the Transformer Examples first as I suspect that this may be what you're looking for.
Remember, it doesn't all have to be completed in a single Transformer stage, you can get the initial cleansing done in the first transformer and then do the complex loop in the second, break the steps down for what works for you.

splitting JSON string using regex

I want to split a JSON document and which has a pattern like [[[1,2],[3,4][5,6]]] using regex. The pairs represent x ad y. What I want to do it to take this string and produce a list with {"1,2", "3,4","5,6"}. Eventually I want to split the pairs. I was thinking I can make a list of {"1,2", “3,4","5,6"} and use the for loop to split the pairs. Is this approach correct to get the x and y separately?
JSON is not a regular language, but a Context free language, and as such, cannot be matched by a regular expresion. You need a full JSON parser like the ones referenced in the comments to your question.
... but, if you are going to have a fixed structure, like only three levels of square brakets only, and with the structure you posted in your question, then there's a regexp that can parse it (It would be a subset of the JSON grammar, not general enough to parse other JSON contents):
You'll have numbers: ([+-]?[0-9]+)
Then you'll have brackets and separators: \[\[\[, ,, \],\[ and \]\]\]
and finally, put all this together:
\[\[\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\],\[([+-]?[0-9]+),([+-]?[0-9]+)\]\]\]
and if you want to permit spaces between symbols, then you need:
\s*\[\s*\[\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*,\s*\[\s*([+-]?\d+)\s*,\s*([+-]?\d+)\s*\]\s*\]\s*\]\s*
This regexp will have six matching groups that will match the corresponding integers in the matching string as the folloging demo
Clarification
Regular languages, and regular grammars, and regular expressions form a class of languages with many practical properties, for example:
You can parse them efficiently in one pass with what is called a finite automaton
You can define the automaton to accept language sentences simply with a regular expression.
You can simply operate with regexps (or with automata) to make more complex acceptors (for the union of language sets, intersection, symmetric difference, concatenation, etc) to make acceptors for them.
You can simply say if one regular expression (the language it defines) is a subset, superset or none of the language of the original.
By contrast, it limits the power of languages that can be defined with it:
you cannot define languages that allow nesting of subexpressions (like the bracketing you allow in JSON expressions or the tag nesting allowed in XML documents)
you cannot define languages which collect context and use it in another place of the sentence (for example, sentences that identify a number and have to match that same number in another place of the sentence)
But, the meaning of my answer is that, if you bind the upper limit of nesting (let's say, for example, to three levels of parenthesis, like the example you posted) you can make your language regular and then parse it with the regular expression. It is not easy to do that, because this often leads to complex expressions (as you have seen in my answer) but not impossible, and you'll gain the possibility of being able to identify parts of the sentence as submatches of the regular subexpressions embedded in the global one.
If you want to allow nesting, you need to switch to context free languages, which are defined with context free grammars and are accepted with a more complex stack based automaton. Then, you loose the complete set of operations you had:
You'll never be able again to say if some language overlaps another (is included)
You'll never be abla again to construct a language from the union, intersection or difference of other context free languages.
But you will be able to match unbounded nested sentences. Normally, programming languages are defined with a context free grammar and a little more work for context checking (for example, to check if some identifier being used is actually defined in the declaration section or to match the starting and ending tag identifiers at matching levels in an XML document)
For context free languages, see this.
For regular languages, see this.
Second clarification
As in your question you didn't expressed you wanted to match real, decimal numbers, I have modified the demo to make it to allow fixed point numbers (not general floating point with exponential notation, you'll need to work it yourself, as an exercise). Just make some tests and modify the regexp to adapt it to your needs.
(well, if you want to see the solution, look at it)
Yeah i tried using the regex in my code but it is not working so I am trying a different approach now. I have an idea of how to approach it but it is not really working. First of let me be more clear on the question. What I am trying to so parse a JSON document. Like the image below. the file has a strings have [[[1,2],[3,4][5,6]]] pattern. What I am trying to get out of this is to have each pair as a list. So the list has an x-y pairs.
the string structure
My approach: first replace the “[[“ and “]]” at the begging and at the end, so I have a string with the same pattern through out. which gives [enter image description here][2]me a string “[1,2],[3,4][5,6]” This is my code but it is not working. How do I fix it? The other thing I though it could be an issue is, the strings are not the same length so. So how do I replace just the beginning and the ending?
my code
Then I can use a regex split method to get a list that has a form {“1,2” , “3,4”, “5,6”}. I am not really sure how to do this though.
Then I take the x, and the y, and add them and add those to the list. So I get of a list pair x-y pair. I will appreciate if you show me how to do this.
This is the approach I am working on but if there is a better way of doing it I will be glad to see it. [enter image description here][4]

Distinguish between function calls and indexed arrays using ANTLR4

The syntax of a language is ambiguous in the sense that function names and indexed identifiers are written in an equal way:
var = function(5) => function call where 5 is a parameter
var = array(5) => element 5 of the array
To be able to make the distinction I need to make a first pass and create a symbol table. After that I want to use predicates to do something like:
reference
: {isFunction(getCurrentToken().getText())}? ident (argumentList?)
| {!isFunction(getCurrentToken().getText())}? ident (subscriptionList)?
;
But several questions remain:
Do I have to "extend/inherit" the parser to add the code of "isFunction"? Or do I have to put it in the .g4 file itself?
Is predicates the best way here, or is there a better way to achieve all this?
How to run the parser twice? How to handle the "first" run? (in that case isFunction will always return false as the symbol table is not yet constructed)
Somehow I feel there must be an easy, clean way to handle the above issue...
Not directly the answer you might have seeked but I recommend to do it all in code after parsing and not parse the file twice or make the paring dependent on the symbol table.
This could be done by allowing both function calls / array accesses to appear where any one of them would be allowed.
When you transform the rules into an internal representation later on, you can distiguish these two based upon the knowledge of the symbol table.

Solr store expression as variable to avoid recomputing it

I'd like to know if there is a way to store a Solr function expression in an intermediate variable so it doesn't get recomputed each time.
Take for example my concrete need. I have the necessity to sort by distance (but each distance grouped in a different range set, like it would be the case with distance faceting frange). These ranges are user definable and can be of any length.
For example a user defines the range set 1,2,3, the expression sent to Solr would be:
max(map(geodist(),0,1,1),max(map(geodist(),1,2,2),map(geodist(),2,3,3))) asc
Can that geodist() call be stored/memoized, or does Solr internally optimize such expressions?
I am not sure about the following but might be worth a try :-
{!func}max(map($v3,0,1,1),$v2)&v2={!func}max($v4,$v5)&v4=map($v3,1,2,2)&v5=map($v3,2,3,3)&v3=geodist()
The above is called parameter dereferencing.
http://wiki.apache.org/solr/LocalParams
we are assigning the function geodist() to a local parameter v3. This parameter is substituted in another set of parameters :- v4 and v5 which are further substituted in v2 and the main function.
However, the above feature was introduced recently in solr 4.0

Resources