How to make SOLR manipulate the results? - solr

For an E-commerce website, we have stored the products as SOLR documents with the following fields and weights:
Title:5
Description:4
For some products, we need to ensure that they appear in the top ten results even if their relevance in the above two fields does not qualify them for being in top 10. For example:
P1, P2, .... P10 are the legitimate products for a given search keword "iPhone". I have S1 ... S100 as sponsored products that want to appear in the top 10. My policy is that only 2 of these 100 sponsored products will be randomly chosen and shown in the top 10 so that the results will be: S5, S31, P1, P2, ... P8. In the next request, the sponsored products that gets slipped in may be S4, S99.
The QueryElevationComponent lets us specify the docIDs for keywords but does not let us randomize the results such that only 2 of the complete set of sponsored docIDs is sent in the results.
Any suggestions for implementing this would be appreciated.
Thanks,
Yash

This sounds like a case where you will need to issue two separate queries to Solr, one for the legitimate products and another one for the sponsored products. Then you will need to manually manipulate/construct the results based off the two Solr results that are returned so that they meet the expected behavior. I do not know of any way to accomplish this directly in Solr.

Just an idea that might lead you in the right direction:
You could use a FunctionQuery for sorting. Within this FunctionQuery you could check if a result is a sponsored result. If yes, depending on the index (0-99) of the spnsored result and two two-digit-parts of the actual time got by ms() you can decide to lift the result up or not by returning the score of the initial query or a modified one.
A result is lifted up, if its index is identical to one of the two two-digit-parts.

This is solved pls use elevation component.
https://cwiki.apache.org/confluence/display/solr/The+Query+Elevation+Component

Related

(Lucene/SOLR) Can I annotate the results of a query based on grouping by subqueries?

I would like to group the results of any query in terms of "categories".
"Categories" are keyword queries, they cannot be pre-defined at index time, since they evolve and change over time.
More specifically:
I have a set of categories defined by queries q1,q2,...qN.
Given a user query (q), I need to return the top resulting docs (d1,...d10) as usual,
but I need to know if they belong or not to each of the groups q1,...qN.
As I understand it I could use grouping with queries, but this has two drawbacks:
I will change the results, since instead of d1,...d10 I will get top docs for each query
I will loose the original ordering of the results
The only solution I can think of right now is to issue first q to get the results and ordering, then each of q AND q1, q AND q2, etc. to get the grouping, then parse all the results and group outside the query... expensive!
Any ideas how can I get what I need?
You can use the normal way to do the query, and then add pseudo-fields in fl param that matches the clipping against your categories using function queries.
http://solr.pl/en/2011/11/22/solr-4-0-new-fl-parameter-functionalities-first-look/
https://cwiki.apache.org/confluence/display/solr/Function+Queries
Example:
fl=category1:sum(0.0, query($q1))
q1={!dismax}your query 1
fl=category2:sum(0.0, query($q2))
q2={!dismax}your query 2

Solr dynamic sorting

We have a website on which you can search through a large amount of products from different shops. Say we have 5 products per result page and the 10 best matches for a search have all the same score. 8 of the products are of one shop (A), and the two others by two other shops (B,C).
What we often get is (letter indicating a product of this shop)
A
A
A
A
A
---- second result page ----
A
B
A
C
A
but what we want to get is something like this:
A
C
B
A
A
---- second result page ----
A
A
A
A
A
Writing function query seems to be one option
http://www.solrtutorial.com/custom-solr-functionquery.html
What is the best way to achieve this?
You could group the results by shop using Field Collapsing and display the result either as a group or flattened list (depending on how you want it).
Another trick that I've seen in use to help the users see results from multiple group is to use Facets. You could have a sidebar (or something similar) that does two things:
By default it lets the user know that there are other filter criteria (ex. shops) in the result. This helps a lot when the result is paginated.
With facets being present, it is upto the user to choose whatever criteria she/he wishes to apply, thus relieving you of implementing heavy scenario based logic.
Read more about faceting here.
Edit:
If you have to use custom sort logic, you could write it down using Functions and use it in the sort when querying Solr. Here is the reference from the docs.

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

solr pagination and grouping

I am using LucidWorks and Solr to implement search in a large and diverse web app which has many different types of pages. The spec calls for a single search results page grouped by page type with pagination of search results in each group.
I can group easily enough with something like this
q=[searchterm]&group=true&group.field=[pagetypefield]
which returns nicely grouped results.
I can also do:
q=[searchterm]&group=true&group.field=[pagetypefield]&group.offset=[x]&group.limit=[y]
which will get me y results per group starting at result x
However what i want to be able to do is supply an offset and limit per group because i might want to get results 0-4 for group 1 and results 5-9 for group 2.
The values for [pagetypefield] are a list of known values so i can do multiple queries like:
q=[searchterm]&group=true&group.query=[pagetypefield]:[value]&group.offset=[x]&group.limit=[y]
for each known value of [pagetypefield]
or to not use group.offset and in my example get results 0-9 for both groups and just discard the results i don't need.
I don't really like either option but i can't find a way in the documentation to specify offset and limit on a per group basis.
Any advice would be most appreciated.
I have confirmation from LucinWorks that what i want to do is not possible and they recommended the multiple search solution as the first search will be chached so subsequent searches will be really fast.
What I think I'm going to end up doing is to group the search results taking first n results for each group, then use ajax to paginate each group.

Solr complicated faceting

I have problems with faceting. Imagine this situation. Product can be in more than one category. This is common behavior for faceting:
Category
Android (25)
iPhone (55)
other (25)
Now when I select "Android", I make new query with "fq" => "category:Android", I will get:
Category
Android
iPhone (15)
other (2)
But this means that there is 15 products, that are in categories "Android" AND "iPhone". I would like something like this: ("Android" OR "iPhone")
Category
Android
iPhone (+5)
other (+1)
Meaning I will get 25 results by selecting "Android (25)" and another 5 by selecting "iPhone (+5)", so finally I will get 30 search results..
Does anyone know if this is possible with SOLR's faceting? Or perhaps with more than one query and calculate it manually?
Thanks for advice!
Try a new query with the negative of the selections, like "fq" => "-category:Android" - you should then get the facet counts you are looking for.
Depending on all the permutations you need, you probably want to look into query facets that enable you to get counts for arbitrary queries. For instance, you can do facet.query=category:("Android" OR "iPhone") and get a count results keyed on category:("Android" OR "iPhone"). And, you can do this for any number of queries you want counts for. So, in your case, you can probably get to a final solution with some combination of straight field facets and query facets.
Edit: Re-reading you question, you may also want to look into tagging and excluding parts of an extra fq, depending on how you are allowing your users to "select into" the choices. (The example in the docs is fairly close to your original setup, although I'm not sure the end behavior is exactly as you desire).

Resources