Sort or filter results by function query defined in a field - solr

I have a Solr 6.2 instance running, and I'm exploring its advantages and limitations. One limitation I've run into seems to be that you can't sort or filter the data based off of a field function query.
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq=total:[10 TO *]
Solr responds with an error stating that the total field does not exist. Indeed, the field is not defined in my schema because it's not a stored part of the dataset - it's calculated at query time. They call it a pseudo field. I haven't been able to find an example in the documentation or a solution online. So, is there a way around this?

.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq={!frange l=10} sum(v1,v2)
I have very same problem as you.
I want to query particular division value of two fields.
I tried to used [0.3 TO *] like you.
You can also use upper bound for your range if you need.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.6.pdf
"l" is for lower bound.
"u" is for upper bound.
fq={!frange l=0 u=2.2} sum(user_ranking,editor_ranking)
Maybe this works for you?

you can do this. instead of total try sum.
you can find more info here. https://wiki.apache.org/solr/FunctionQuery#What_is_a_Function.3F
an example from the sole wiki.
Example Function Queries
To give you a better understanding of how function queries can be used in Solr, suppose an index stores the dimensions in meters x,y,z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search for box matching name findbox but ranked according to volumes of boxes. The query parameters would be:
q=boxname:findbox val:"product(x,y,z)"
This query will rank the results based on volumes. In order to get the computed volume, you will need to request the score, which will contain the resultant volume:
&fl=*, score
Suppose that you also have a field storing the weight of the box as weight. To sort by the density of the box and return the value of the density in score, you would submit the following query:
http://localhost:8983/solr/collection_name/select?q=boxname:findbox val:"div(weight,product(x,y,z))"&fl=boxname x y z weight score`
you can read more about it here. https://cwiki.apache.org/confluence/display/solr/Function+Queries
Try this
solr/collection/select?q=*:* _val_:"sum(v1,v2)"&fl=* score&fq={!frange l=10 }sum(v1,v2)

Related

Apachesolr search result sorting

I'm trying to change sorting from apache Solr query.
for example bundle type: story, videogallery and category_management are indexed.
I wanted to show all results related to bundle type: category_management on top.
See attached screenshot:
Please help me to user Solr filter to sort my result.
The easiest way is to apply a boost to any entry that matches your requirement:
&bq=bundle:category_management^50
The weight - 50 - may have to be adjusted to get the result you want. This is faster than sorting by a function.
This will also still keep relevancy score inside each set of documents, compared to sorting by a function or adding a priority field for sorting.
If you want to actually apply a sort on multiple fields instead, you can first sort by a function that returns 1 for values that match and 0 for values that doesn't. Something like:
sort=if(termfreq(bundle,'category_management'),1,0),ds_changed desc

Display Solr results based on custom selection

I have a filed 'qualification' which is having multiple values (something like MCA, MBA, MSC, PhD, ...).
My requirement is to display results in the order MSC, MCA, PhD, MBA. So, I am using the below query to boost the field values.
&bq=(qualification: "MSC"^5 "MCA"^4 "PhD"^3 "MBA"^2)
The above query is working only when I use q=*:*
But when search with any text like q=course, I am not getting the results with specified order.
Please help what I did wrong.
Thanks & Regards
Venu
You're probably not doing anything "wrong", but when you actually search for something, the score isn't flat (i.e. it's no longer just 1) any more.
If you don't want your query to affect the score, use a filter query (fq instead). This does however not give you any actual relevance inside the results - if you still want that, you'll probably have to adjust your boosts to be far higher, so that the actual scores are only used internally within each boost level.
&bq=qualification:"MSC"^50000
&bq=qualification:"MCA"^40000
&bq=qualification:"PhD"^30000
&bq=qualification:"MBA"^20000
If you append debugQuery=true to your query string, you can see how the score is calculated for each document, and adjust your boosts accordingly.

SOLR index time boost depending on the field value

Is it possible to boost a document on the indexing stage depending on the field value?
I'm indexing a text field pulled from the database. I would like to boost results that are shorter over the longer ones. So the value of boost should depend on the length of the text field.
This is needed to alter the standard SOLR behavior that in my case tends to return documents with multiple matches first.
Considering I have a field that stores the length of the document, the equivalent in the query of what I need at indexing would be:
q={!boost b=sqrt(length)}text:abcd
Example:
I have two items in the DB:
ABCDEBCE
ABCD
I always want to get ABCD first for the 'BC' query even though the other item contains the search query twice.
The other solution to the problem would be ability to 'switch off' the feature that scores multiple matches higher at query time. Don't know if that is possible either...
Doing this at index time is important as the hardware I run the SOLR on is not too powerful and trying to boost on query time returns with OutOfMemory Exception. (Even If I could work around that increasing memory for java I prefer to be on the safe side and implement the index the most efficient way possible.)
Yes and no - but how you do it depends on how you're indexing your documents.
As far as I know there's no way of resolving this only on the solr server side at the moment.
If you're using the regular XML based interface to submit documents, let the code that generates the submitted XML add boost=".." values to the field or to the document depending on the length of the text field.
You can check upon DIH Special Commands which has a $docBoost command
$docBoost : Boost the current doc. The value can be a number or the
toString of a number
However, there seems no $fieldBoost Command.
For you case though, if you are using DefaultSimilarity, shorter fields are boosted higher then longer fields in the Score calculation.
You can surely implement your own Simiarity class with a changed TF (Term Frequency) and LengthNorm Calculation as your needs.

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

Solr Distance Filtering

I am trying to do distance range search using Solr.
I know its very easy to do a search for filtering within the 5km range
&q=*:*&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}
What I am after is how to do the same thing if I am looking in a range of say 5 to 10 km ??
Thanks
Here are a couple ways to approach this. So clearly the query goes into a filter query ("fq" param) since the intention is not to modify the score. And lets assume the these parameters are set in the request URL (although they don't have to be placed there):
pt=45.15,-93.85&sfield=store
Here is one approach:
_query_:"{!geofilt d=10}" -_query_:"{!geofilt d=5}"
I used the _query_ Solr syntax hack to enter a sub-query which offers the opportunity to switch the query parser from the Lucene one to a geo one.
Here's another approach that is probably the fastest:
{!frange l=5 u=10}geodist()
This one is a function query returning the distance that is then limited to the desired range. It is probably faster since it will evaluate each document once each instead of twice like the previous will.
You may want to mark this as not cacheable and add a bbox filter so that far fewer then every document is examined. Here is the final result (not url escaped):
pt=45.15,-93.85&sfield=store&fq={!frange l=5 u=10 cache=false cost=100}geodist()&fq={!bbox d=10}

Resources