Solr: How to resort top documents from query?

Solr: How to resort top documents from query? - solr

I am running Solr 4.10.3 and trying to resort top 10 documents come from Solr. How can i do that? I am thinking of sub-query but don't know how to do that, needed help.
Example:
Suppose on query of "car" Solr return 250 documents on the basis of high score of relevancy. Now from 250 documents take top 10 documents and resort them on the basis of custom field.
i can't do that:
select?q=car&sort=score desc, pr desc
Because it will do sorting on entire 250 documents. So is there any solution?

I think you mean query reranking ? This is what is used when you want to:
use a first, more lightweight query to get a result based on the score
get a fair top x number of matches, like 2000 for example
use a heavier query to resort only those again, to get the final score
That is what you need to use if you want to do it in two steps, like you state. Now, I am not sure you need to use query reranking for your use case, maybe just boosting by that field, or sorting on a function should be enough for you.

Related

How to boost a solr document at query time based on attribute value

I want boost at query time all documents that have value user_id=2. Basically I want on the top of my results all the documents belonged to a specific user.
After looking at some Solr resources I ended up writing a query like, but it is not working properly.
/solr/public-main/select?q={!boost b=if(div(155623,user_id),2,1)}sometext&wt=json&indent=true&debugQuery=true
Any hints?
Thanks

You don't need to use the boost with a dynamic boost. Apply a boost query which will boost all the documents that match the query: bq=user_id:2^4. Adjust 4 to a suitable boost value depending on the rest of your boosts (if any in q or qf).

One option is to have a function query with fl=x,y,userexists:exists(query({!v='user_id:2'})) and then u can sort by userexists and then by score field.

SOLR - Limiting Search Results

Is there a way to restrict the number of search results returned from SOLR. I am working for a client who would like to restrict the search results to 100 (based on search score) . I can use rows but that would only restrict the results per page and not the total results. The problem with that is If the sort function of SOLR is used, it would sort all the results and the product which has 105th rank based on score might come on top because of the low price. I want the sort to happen only on the top 100 results. Is there a way to do that ?
Thanks for your help!
Supreet

You can use the Sort By Function.
You will have to query the normal way with rows=100 and also add the &sort=<query>.
I could not try it as I do not have a Solr instance right now. Please let me know if ti works or not.

How to perform these SOLR queries?

I have an indexed data(indexed using solrj from rdbms) having fields related to banking such as(sample):customerid, cust_name, accountno, amount, positions, pos_value, EOD_value etc
Now i want to do some searching on the data and search queries:
top 10 stocks/positions(based on stock value)
top 5 customers in decreasing order of amount in bank
which stock gained the max in a day (and the stock details)
lowest value of a stock in a particular time frame
How is it possible to query for the above in SOLR
I did read Function Query and solr Plugins but could not find much useful information...
Can we perform faceting on fields(amount,stock value etc) using some maths operations like average,sum etc...
I want to use velocity UI for the following search and what customization to its search box would be required?
Any idea???

Solr is a high performance text search engine, bases on Lucene, an excellent token matching and scoring library. This said, the kind of queries you want to run will certainly work in one way or an other with Solr, but you will have to provide Solr will all the data you want to search on. Solr will not compute min, max average values for you. It's job is it to find, rank and sort as fast as possible in previously computed values.
The fields you have listed might not give you all the details you are looking for. You will need to index some more.
If you have the data you are looking for in your index, the following queries might get the answer you are looking for or should give you a hint on how to state them.
top 10 stocks/positions
q=*:* sort=stock_value DESC rows=10
This requires that stock_value is numeric and has the latest stock price in the index.
top 5 customers
This is pretty similar.
q=*:* sort=account_value DESC rows=5
which stock gained the max in a day
You will need to index the gain per day
q=date:1995-12-31T23:59:59.999Z sort=stock_gain DESC rows=1
lowest value of a stock in a particular time frame
q=symbol:abc123 date:[1995-12-31T23:59:59.999Z TO 2007-03-06T00:00:00Z] sort=stock_value ASC rows=1
See Solr Query Syntax for Details on Date Queries

We have implemented the same thing in one of our application.
In Browse.vm under navigators "div" we have created our custom facet and when we click on that facet it recreats the url along with the parameter mentioned by "phisch" in his answer.
Example:
We have created a link called "Top 10 Stocks" in facet section on UI, and when we click it. we created a url adding parameters as
q=:&sort=stock_value DESC&rows=10
Please try this out at your end as it is working fine at my end. Sorry I cannot share the code as it is client confidential.

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!

I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.

To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

how can I limit by score before sorting in a solr query

I am searching "product documents". In other words, my solr documents are product records. I want to get say the top 50 matching products for a query. Then I want to be able to sort the top 50 scoring documents by name or price. I'm not seeing much on how to do this, since sorting by score, then by name or price won't really help, since scores are floats.
I wouldn't mind if I could do something like map the scores to ranges (like a score of 8.0-8.99 would go in the 8 bucket score), then sort by range, then by names, but since there is basically no normalization to scoring, this would still make things a bit harder.
Tl;dr How do I exclude low scoring documents from the solr result set before sorting?

You can use frange to achieve this, as long as you don't want to sort on score (in which case I guess you could just do the filtering on the client side).
Your query would be something along the lines of:
q={!frange l=5}query($qq)&qq=[awesome product]&sort=price asc
Set the l argument in the q-frange-parameter to the lower bound you want to filter score on, and replace the qq parameter with your user query.

As observed by Karl Johansson, you could do the filtering on the client side: load the first 50 rows of the response (sorted by score desc) and then manipulate them in JS for example.
The jQuery DataTables plugin works fantastically for that kind of thing: sorting, sorting on multiple columns, dynamic filtering, etc. -- and with only 50 rows it would be very fast too, so that users can "play" with the sorting and filtering until they find what they want.

I don't think you can simply
exclude low scoring documents from the
solr result set before sorting
because the relevance score is only meaningful for a given combination of search query and resulting document list. I.e. scores are only meaningful within a given search and you cannot set some threshold for all searches.
If you were using Java (or PHP) you could get the top 50 documents and then re-sort this list in your programming language but I don't think you can do it with just SOLR.
Anyway, I would recommend you don't go down this route of re-sorting the results from SOLR, as it will simply confuse the user. People expect search results to be like Google (and most other search engines), where results come back in some form of TFIDF ranking.
Having said that, you could use some other criteria to separate documents with the same relevance scores by adding an index-time boost factor based on a price range scale.
I'd suggest you use SOLR to its strengths and use facets. Provide a price range facet on the left (like Ebay, Amazon, et al.) and/or a product category facet, etc. Also provide a "sort" widget to allow the results to be sorted by product name, if the user wants it.
[EDIT] this question might also be useful:
Digg-like search result ranking with Lucene / Solr?