Setting up facets in SOLR - solr

I am using solr and I want to setup two different types of facets on my data. The first is date, so I would like it like this:
Posted:
Today
3 days
7 days
All time
Distance:
5 miles
10 miles
30 miles
100 miles
How should I be setting up faceting? It looks like something I need to edit my solr.xml or my schema.xml but it is all very confusing and the help documents boggle my mind.
Can someone who has done this before give me a bit of guidance please?

facet.query: This parameter can be specified multiple times to indicate that multiple queries should be used as separate facet constraints. So you can do distance:[* TO 5], distance:[5 TO 10] and you will get the individual counts.
Then there is facet.date and facet.range which may better suit you.

Related

Solr Boost-Function on Sales

I am using Apache Solr 8 with products as documents. Each document includes sales within the last X days that I want to boost, as well as a title and other fields.
Say productA has been sold 5 times, I want to boost it with score+10; a productB has been sold 50 times, I want to boost the score by 30.
I tried to use a boostFunction that looks like (edismax query parser)
q=Coffee&qf=title&bf=if(lt(sales,5),10,if(lt(sales,50),30))
Solr now returns documents that have nothing to do with my "Coffee"-Query but just match the boostfunction. There are even results with score "0".
E.g.
Rank;Score;Sales;Title
1;58.53;55;Coffee big
2;38.11;50;Coffee
3;30;55;Tea
Any idea to get rid of those "only boost function"-matches?
Found the answer!
My Query-Fields actually included boostings like
&qf=title^2 longDescription^0 whatever^0...
Instead of excluding the results found in those 0-boosted fields, solr adds them and matches with - well score 0.
When I remove the 0-boostings, everything works as intended.

Ranged Solr Queries on timestamp field

I'm trying to build a solr query for timecode values with the following format:
run_time:
00:25:00
00:30:00
01:00:00
I'd like to filter a range of timecodes. Everything 30 minutes or under, for example.
I've tried a couple of queries with no success.
run_time:[* TO 00:30:00]
run_time:[* TO 00\:30\:00]
run_time:[* TO *00:31:00Z]
Any thoughts?
Read here for more details about solr date operations:
run_time:[ NOW-30MINUTES TO NOW ]
Your use case appears to be searching for processes which ran for 30 minutes or less. Date field will not be the right fit here.
A better approach would be to store the run time information in a number field.
run_time:
25
30
60
The you can use a range query on this field to get desired results.
runtime:[0 TO 30]
This will fetch records having runtime between 0 to 30minutes including 0,30

What is the optimized way for queries on partial dates in GAE Text Search?

Need to get entities filtering by month instead of complete date values (E.g. Birthdays) using Google App Engine Text Search. On verifying GAE docs, I think it is not possible to query date fields by month directly.
So in order to filter them by month/date, we consider saving each date sub value like Date(DD), Month(MM) and Year(YYYY) as separate NUMBER field along with complete date field.
I verified locally that we can achieve by saving like this. But is this the correct way of saving dates by splitting each field when we want to query on date sub values?
Is there any known/unknown limit on number of fields per document apart from 10GB size limit in GAE Text Search?
Please suggest me.
Thanks,
Naresh
The only time NUMBER or DATE fields make sense is if you need to query on ranges of values. In other cases they are wasteful.
I can't tell from your question exactly what queries you want to run. Are you looking for a (single) specific day of the month (e.g., January 6 -- of any year)? Or just "anything in June (again, without regard to year)"? Or is it a date range: something like January 20 through February 19? Or July 1 through September 30?
If it's a range then NUMBER values may make sense. But if it's just a single specific month, or a single month and day-of-month combination, then you're better off storing month and day as separate ATOM fields.
Anything that looks like a number, but isn't really going to be searched via a numerical range, or done arithmetic on, isn't really a number, and is probably best stored as an ATOM. For example, phone numbers, zip codes (unless you're terribly clever and wanting to do something like "all zip codes in San Francisco look like 941xx" -- but even then if that's what you want to do, you're probably better off just storing the "941" prefix as an ATOM).

How to apply boosting in solr

I am new to solr, please help me in boosting fields.
I have a query like this,
q=name:test* OR description:test*
i want to apply boosting/weight age for name its 500 and for description its 50.
for example:
lets consider "test" term is appearing for 1 time in name field in one record and 20 times in description field its from another record, then boosting calculation should happen like below.
for name: 1 X 500 = 500
for Description: 20 X 50 = 1000.
as result the records with high boosting value should come at top.
so based on above calculation the record which having description field with 20 matches should come on top after that record with 1 match in name field.
If any one have solution for this, please provide
Thanks in advance.
You can boost a field at index time with the boost attribute, or you can apply a boost in the query, such as q=name:test*^50 OR description:test* (and there are some more advanced features here as well).
I bears noting though, Lucene, by default, applies a length normalization that effectively weighs matches on shorter fields more heavily than longer fields. It sounds a bit like that is what you are trying to recreate.
If you need the scoring calculation to be as simple as what you have provided, you would need to write your own Similarity class, I believe.

SOLR faceting slower than manual count?

I'm trying to get SOLR range query working. I have a database with over 12 milion documents, and i am filtering by few parameters for example:
product_category:"category1" AND product_group:"group1" AND product_manu:"manufacturer1"
The query itself returns about 700 documents and executes in two-three seconds on average.
But when i want to add date range facet to that query (i want to see how many products were added each day for past x years) it executes in 50 seconds or more. So it seems that it would be faster to just retrieve all matching documents and perform manual counting in java.
So i guess i must be doing something wrong with faceting?
here is an example faceted query:
start=0&rows=0&facet.query=productDate%3A[0999-12-26T23%3A36%3A00.000Z+TO+2012-05-22T15%3A58%3A05.232Z]&q=source%3A%22source1%22+AND+productCategory%3A%22category1%22+AND+type%3A%22type1%22&facet=true&facet.limit=-1&facet.sort=count&facet.range=productDate&facet.range.start=NOW%2FDAY-5000DAYS&facet.range.end=NOW%2FDAY%2B1DAY&facet.range.gap=%2B1DAY
My only explanation is that SOLR is counting fields on some larger document pool than my 700 documents resulting from "q=" parameter. Or maybe i should filter documents in another way?
I have tried changing filterCache size and it works, but it seems to be a waste of memory for queries like these. After all aggregating over 700 documents should be very fast shouldnt it?

Resources