SOLR - Limiting Search Results - solr

Is there a way to restrict the number of search results returned from SOLR. I am working for a client who would like to restrict the search results to 100 (based on search score) . I can use rows but that would only restrict the results per page and not the total results. The problem with that is If the sort function of SOLR is used, it would sort all the results and the product which has 105th rank based on score might come on top because of the low price. I want the sort to happen only on the top 100 results. Is there a way to do that ?
Thanks for your help!
Supreet

You can use the Sort By Function.
You will have to query the normal way with rows=100 and also add the &sort=<query>.
I could not try it as I do not have a Solr instance right now. Please let me know if ti works or not.

Related

Apachesolr search result sorting

I'm trying to change sorting from apache Solr query.
for example bundle type: story, videogallery and category_management are indexed.
I wanted to show all results related to bundle type: category_management on top.
See attached screenshot:
Please help me to user Solr filter to sort my result.
The easiest way is to apply a boost to any entry that matches your requirement:
&bq=bundle:category_management^50
The weight - 50 - may have to be adjusted to get the result you want. This is faster than sorting by a function.
This will also still keep relevancy score inside each set of documents, compared to sorting by a function or adding a priority field for sorting.
If you want to actually apply a sort on multiple fields instead, you can first sort by a function that returns 1 for values that match and 0 for values that doesn't. Something like:
sort=if(termfreq(bundle,'category_management'),1,0),ds_changed desc

Solr: How to resort top documents from query?

I am running Solr 4.10.3 and trying to resort top 10 documents come from Solr. How can i do that? I am thinking of sub-query but don't know how to do that, needed help.
Example:
Suppose on query of "car" Solr return 250 documents on the basis of high score of relevancy. Now from 250 documents take top 10 documents and resort them on the basis of custom field.
i can't do that:
select?q=car&sort=score desc, pr desc
Because it will do sorting on entire 250 documents. So is there any solution?
I think you mean query reranking ? This is what is used when you want to:
use a first, more lightweight query to get a result based on the score
get a fair top x number of matches, like 2000 for example
use a heavier query to resort only those again, to get the final score
That is what you need to use if you want to do it in two steps, like you state. Now, I am not sure you need to use query reranking for your use case, maybe just boosting by that field, or sorting on a function should be enough for you.

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

solr pagination and grouping

I am using LucidWorks and Solr to implement search in a large and diverse web app which has many different types of pages. The spec calls for a single search results page grouped by page type with pagination of search results in each group.
I can group easily enough with something like this
q=[searchterm]&group=true&group.field=[pagetypefield]
which returns nicely grouped results.
I can also do:
q=[searchterm]&group=true&group.field=[pagetypefield]&group.offset=[x]&group.limit=[y]
which will get me y results per group starting at result x
However what i want to be able to do is supply an offset and limit per group because i might want to get results 0-4 for group 1 and results 5-9 for group 2.
The values for [pagetypefield] are a list of known values so i can do multiple queries like:
q=[searchterm]&group=true&group.query=[pagetypefield]:[value]&group.offset=[x]&group.limit=[y]
for each known value of [pagetypefield]
or to not use group.offset and in my example get results 0-9 for both groups and just discard the results i don't need.
I don't really like either option but i can't find a way in the documentation to specify offset and limit on a per group basis.
Any advice would be most appreciated.
I have confirmation from LucinWorks that what i want to do is not possible and they recommended the multiple search solution as the first search will be chached so subsequent searches will be really fast.
What I think I'm going to end up doing is to group the search results taking first n results for each group, then use ajax to paginate each group.

Solr Distance Filtering

I am trying to do distance range search using Solr.
I know its very easy to do a search for filtering within the 5km range
&q=*:*&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}
What I am after is how to do the same thing if I am looking in a range of say 5 to 10 km ??
Thanks
Here are a couple ways to approach this. So clearly the query goes into a filter query ("fq" param) since the intention is not to modify the score. And lets assume the these parameters are set in the request URL (although they don't have to be placed there):
pt=45.15,-93.85&sfield=store
Here is one approach:
_query_:"{!geofilt d=10}" -_query_:"{!geofilt d=5}"
I used the _query_ Solr syntax hack to enter a sub-query which offers the opportunity to switch the query parser from the Lucene one to a geo one.
Here's another approach that is probably the fastest:
{!frange l=5 u=10}geodist()
This one is a function query returning the distance that is then limited to the desired range. It is probably faster since it will evaluate each document once each instead of twice like the previous will.
You may want to mark this as not cacheable and add a bbox filter so that far fewer then every document is examined. Here is the final result (not url escaped):
pt=45.15,-93.85&sfield=store&fq={!frange l=5 u=10 cache=false cost=100}geodist()&fq={!bbox d=10}

Resources