Solr facet offset by term/prefix rather than index - solr

I'm generating facet counts for a multivalued field and sorting them by index in order to see them in alphabetical order. Given a particular facet prefix, I would like to jump to its place in the facet count list and show the facet counts surrounding it. For example, if my prefix is "wha" then I would want the following returned (four before and four after):
weld 1
welsh 5
west 4
wetland 1
whale 99
wheat 123
wheel 1
whey 9
There are millions of values in the field and so I can't just ask for them all. I need to be able to jump to that location or use some kind of filter on the facet counts themselves. I've tried using facet.offset, but I have to basically do a binary search in order to find the appropriate offset which is too slow.
I could probably get close enough if I could put in a range for a facet prefix. For example facet.prefix=[we TO wk] or even multiple prefixes like facet.prefix=we,wf,wg,wh,wi,wj,wk.
I'm currently using other non-Solr solutions to accomplish this, but I would like to use Solr 6.6 in order to take advantage of filter queries.

Related

Solr Boost-Function on Sales

I am using Apache Solr 8 with products as documents. Each document includes sales within the last X days that I want to boost, as well as a title and other fields.
Say productA has been sold 5 times, I want to boost it with score+10; a productB has been sold 50 times, I want to boost the score by 30.
I tried to use a boostFunction that looks like (edismax query parser)
q=Coffee&qf=title&bf=if(lt(sales,5),10,if(lt(sales,50),30))
Solr now returns documents that have nothing to do with my "Coffee"-Query but just match the boostfunction. There are even results with score "0".
E.g.
Rank;Score;Sales;Title
1;58.53;55;Coffee big
2;38.11;50;Coffee
3;30;55;Tea
Any idea to get rid of those "only boost function"-matches?
Found the answer!
My Query-Fields actually included boostings like
&qf=title^2 longDescription^0 whatever^0...
Instead of excluding the results found in those 0-boosted fields, solr adds them and matches with - well score 0.
When I remove the 0-boostings, everything works as intended.

SOLR - Different score to different words in a multi word search query

I am using SOLR with mongoDB in one of my projects for search. I must say, SOLR is very powerful.
Currently, I am looking for a method to set different scores for different keywords if query is multi word.
e.g. If a user searches of black doll house
the weightage of black should be greater than doll and weightage of doll should be greater than house.
black > doll > house
Is it possible to implement this in SOLR. If yes, how?
You can give a separate weight to each term in the standard lucene query syntax (searching in a field named text):
text:black^10 text:doll^5 text:house
This will give black ten times as much weight as house, and doll five times a much weight as house, but only half the weight of black. You'll have to tweak the weights to get the results you're looking for. If you want to use the regular text in the q= field with (e)dismax as the query parser, you can use bq to add apply these boosts separately from the query itself.
Did you try boosting the terms in the query. you can specify different boost value for a term in the query.
example: if you transform your query to :
textfeild:black^6 textfeild:doll^5 textfeild:house^2
you get results with top documents will be matched for black, next black, next with house.
it multiplies term weight with boost value. here black with 6, doll with 5 and house with 2.

How to apply boosting in solr

I am new to solr, please help me in boosting fields.
I have a query like this,
q=name:test* OR description:test*
i want to apply boosting/weight age for name its 500 and for description its 50.
for example:
lets consider "test" term is appearing for 1 time in name field in one record and 20 times in description field its from another record, then boosting calculation should happen like below.
for name: 1 X 500 = 500
for Description: 20 X 50 = 1000.
as result the records with high boosting value should come at top.
so based on above calculation the record which having description field with 20 matches should come on top after that record with 1 match in name field.
If any one have solution for this, please provide
Thanks in advance.
You can boost a field at index time with the boost attribute, or you can apply a boost in the query, such as q=name:test*^50 OR description:test* (and there are some more advanced features here as well).
I bears noting though, Lucene, by default, applies a length normalization that effectively weighs matches on shorter fields more heavily than longer fields. It sounds a bit like that is what you are trying to recreate.
If you need the scoring calculation to be as simple as what you have provided, you would need to write your own Similarity class, I believe.

Solr: Searching a term in multiple, indexed fields and returning top 'N' hits from each search field

I have two indexed fields in my Solr schema
Employee Name
Manager Name
Which are plain strings.
my Question is: Given a search term, I want to display top 5 suggested completions from Manager Names and the next 5 from Employee Names.
I can use copy fields, but sometimes I get all top 10 results from Employee Names.
I have a hunch that boosting can help me.. but could not figure out how?
Boost can't help you control the results and distribute 5 each in the top 10 results.
Probably you can check on Field Collapsing, where you can group per role (Manager and Name) and limit 5 results for the group.
So you would have 2 groups returned back to you with 5 results each.

SOLR faceting slower than manual count?

I'm trying to get SOLR range query working. I have a database with over 12 milion documents, and i am filtering by few parameters for example:
product_category:"category1" AND product_group:"group1" AND product_manu:"manufacturer1"
The query itself returns about 700 documents and executes in two-three seconds on average.
But when i want to add date range facet to that query (i want to see how many products were added each day for past x years) it executes in 50 seconds or more. So it seems that it would be faster to just retrieve all matching documents and perform manual counting in java.
So i guess i must be doing something wrong with faceting?
here is an example faceted query:
start=0&rows=0&facet.query=productDate%3A[0999-12-26T23%3A36%3A00.000Z+TO+2012-05-22T15%3A58%3A05.232Z]&q=source%3A%22source1%22+AND+productCategory%3A%22category1%22+AND+type%3A%22type1%22&facet=true&facet.limit=-1&facet.sort=count&facet.range=productDate&facet.range.start=NOW%2FDAY-5000DAYS&facet.range.end=NOW%2FDAY%2B1DAY&facet.range.gap=%2B1DAY
My only explanation is that SOLR is counting fields on some larger document pool than my 700 documents resulting from "q=" parameter. Or maybe i should filter documents in another way?
I have tried changing filterCache size and it works, but it seems to be a waste of memory for queries like these. After all aggregating over 700 documents should be very fast shouldnt it?

Resources