solr pagination and grouping - solr

I am using LucidWorks and Solr to implement search in a large and diverse web app which has many different types of pages. The spec calls for a single search results page grouped by page type with pagination of search results in each group.
I can group easily enough with something like this
q=[searchterm]&group=true&group.field=[pagetypefield]
which returns nicely grouped results.
I can also do:
q=[searchterm]&group=true&group.field=[pagetypefield]&group.offset=[x]&group.limit=[y]
which will get me y results per group starting at result x
However what i want to be able to do is supply an offset and limit per group because i might want to get results 0-4 for group 1 and results 5-9 for group 2.
The values for [pagetypefield] are a list of known values so i can do multiple queries like:
q=[searchterm]&group=true&group.query=[pagetypefield]:[value]&group.offset=[x]&group.limit=[y]
for each known value of [pagetypefield]
or to not use group.offset and in my example get results 0-9 for both groups and just discard the results i don't need.
I don't really like either option but i can't find a way in the documentation to specify offset and limit on a per group basis.
Any advice would be most appreciated.

I have confirmation from LucinWorks that what i want to do is not possible and they recommended the multiple search solution as the first search will be chached so subsequent searches will be really fast.
What I think I'm going to end up doing is to group the search results taking first n results for each group, then use ajax to paginate each group.

Related

Getting same record on multiple pages, when implemented pagination in vespa

I am getting same record on different pages when implementing pagination using group by.
I am using the query mentioned below:
http://<hostname>:<port>/search/?yql=select * from sources document_name where sddocname contains 'document_name' | all(group(key) max(2) each(each(output(summary()))));
Are you looking at the grouping results or the normal hits structure? Please note that the grouping expression will not in any way affect the normal hits returned.
You will probably want to add LIMIT 0 / hits=0 and only look at the results from the grouping expression.
You also need a (stable) ordering of the hits for pagination by continuations to work well. This is usually the case as in most use cases there will be a ranking expression in place.
The default ordering in grouping expressions is by rank - in grouping expression syntax this would be order(max(relevance())).
The query above only limits on document type. All documents of that document type will match this query equally well. I tested this using the "album-recommendation-selfhosted" sample app, and relevance was 0 for all documents. When the relevance is the same for all documents, the order will essentially be random. The same thing may occur when doing e.g. order(-count()) if count() is the same for several groups.
I was able to achieve the expected results by adding and using a ranking profile using the random.match rank feature: https://docs.vespa.ai/documentation/reference/rank-features.html#random
I believe this should ensure a stable ordering of hits, although this may still produce different results if the query is dispatched to different (groups of) content hosts. If you need a stable global ordering, consider storing a random float/double to each document to rank/order by - this can also be used as a "tie breaker" to help ensure a stable order from ranking expressions.

how to get resultcounts for each word if multiword-search was without results

On our webshop I want to implement a feature which should do the following:
If a user e.g. searches for "phone magnum", there will be no results.
If there were no results I want to give him the possibility to see
that search for "phone" will give him 139 results
and search for "magnum" will get 12 results.
I don't want to start several queries only for getting those counts. But at the moment I have no Idea how to do that.
I read the Solr-wiki for faceting, but didn't find anything useful for my problem. Maybe I missed something ....
Not sure why you want to avoid multiple queries. If your first search on the phrase "phone magnum" does not return any results, you could issue one query per search keyword with rows=0 which will give you only the counts. This should be efficient, since you are not building any result documents and only getting the result counts.
However, if you really want to avoid the subsequent queries, here is one apporach: Have a field in your index which does not take IDF into account. (See this on how to do that.) Once that field is available (call it say name_no_idf) issue a query against this field name_no_idf:(phone magnum). Notice that this is not a phrase search.
The documents which contain both phone and magnum in the name_no_idf field will get a score of 2, while the docs matching only one word will get a score of 1. To this query you add facet=true&facet.field=name. Then the facet counts you get for these two words will be the counts you are looking for. But few warnings:
if one of the words is very infrequent, you may need to increase facet.limit
facet queries are expensive

solr facet counts not correct with stats and group option

I am using solr search for products-search on our web page. Since now, all works fine.
But while implementing a price slider, to filter actual results by pricerange, I'm stuck with the following issue:
There is no way to exclude filters for the stats option, same way as it is possible on facets. I use stats for getting the overall min- and max-price, no matter what price range is selected (on the slider) and which category is selected on actual search.
So best way to get this values is to exclude the range-filter on stats select, otherwise there will be max- and min-price just for the actual (ranged) result.
exclude a filter on facets (works on solr 4.4):
...&fq={!tag=cat}categories:Electronics/Computers&facet=true&facet.field={!ex=cat}categories&...
But using this for stats is not possible (see https://issues.apache.org/jira/browse/SOLR-3177)
So then I tried using a group select as suggested on that called page.
my solr call looks like this:
fq={!tag=cat}categories:Electronics/Computers&facet=true&
facet.field={!ex=cat}categories_raw&
facet.prefix=Electronics&stats=true&stats.field=minPrice&
stats.field=maxPrice&stats.field=vat&group=true&group.query=minPrice:[* TO 20]
maxPrice:[0 TO *]&group.main=true
All fine. I get the correct stats result and the correct result-count having applied the pricerange-filter. .... EXCEPT the problem, that the facet counts now were wrong, as I did not apply the price range filter.
I know there is a group.facet option, as I also tried. But using that group.facet I need to use a group.field on which the results are based on. In my opinion, usually I need to use the price-field as group.field (group.field=price).
But we do have two price fields on our products (min and max-price). I tried to set them both as group.field parameter, but still get the wrong facet-counts.
It looks like I am just a small step away from the correct solution, but I don't get it.

Solr dynamic sorting

We have a website on which you can search through a large amount of products from different shops. Say we have 5 products per result page and the 10 best matches for a search have all the same score. 8 of the products are of one shop (A), and the two others by two other shops (B,C).
What we often get is (letter indicating a product of this shop)
A
A
A
A
A
---- second result page ----
A
B
A
C
A
but what we want to get is something like this:
A
C
B
A
A
---- second result page ----
A
A
A
A
A
Writing function query seems to be one option
http://www.solrtutorial.com/custom-solr-functionquery.html
What is the best way to achieve this?
You could group the results by shop using Field Collapsing and display the result either as a group or flattened list (depending on how you want it).
Another trick that I've seen in use to help the users see results from multiple group is to use Facets. You could have a sidebar (or something similar) that does two things:
By default it lets the user know that there are other filter criteria (ex. shops) in the result. This helps a lot when the result is paginated.
With facets being present, it is upto the user to choose whatever criteria she/he wishes to apply, thus relieving you of implementing heavy scenario based logic.
Read more about faceting here.
Edit:
If you have to use custom sort logic, you could write it down using Functions and use it in the sort when querying Solr. Here is the reference from the docs.

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

Resources