My goal is to get a document count for each month over the past year. Here is the faceted query I am using against Solr 1.4:
q=*:*
rows=0
facet=on
facet.date=myDateField
facet.date.start=NOW-11MONTH/MONTH
facet.date.end=NOW+1MONTH/MONTH
facet.date.gap=+1MONTH
The ranges this query produces are 2013-01-01T00:00:00Z to 2013-02-01T00:00:00Z, which is inclusive for the upper bound, meaning T00:00:00Z on the first of every month is being counted in 2 different ranges.
Solr 3.1 introduces the facet.date.include parameter that would solve my problem, except upgrading right now is not an option. Is there a workaround to achieving the same functionality? I tried facet.date.gap=+1MONTH-1SECOND which is close, but not close enough. It produces something like this where the end date is not correct:
2012-09-01T00:00:00Z
2012-09-30T23:59:59Z
2012-10-30T23:59:58Z
2012-11-30T23:59:57Z
2012-12-30T23:59:56Z
2013-01-30T23:59:55Z
2013-02-28T23:59:54Z
2013-03-28T23:59:53Z
2013-04-28T23:59:52Z
What you are asking can be done with facet queries instead of facet range.
Try something like this:
facet.query=myDateField:[NOW-11MONTH/MONTH TO NOW-10MONTH/MONTH]
facet.query=myDateField:[NOW-10MONTH/MONTH TO NOW-9MONTH/MONTH]
facet.query=myDateField:[NOW-9MONTH/MONTH TO NOW-8MONTH/MONTH] ...
and so on.
Now you have ful control over any single facet, so you can do -1DAY in the last facet if you need to.
Have a look at the reference for date math syntax:
http://lucene.apache.org/solr/4_4_0/solr-core/org/apache/solr/util/DateMathParser.html
Related
I'm facing a issue wherein I have huge amount of data in Solr and as a result, searching for a multi token query is generating a big recall set. For ex - if i search for "apple watch series 4 42mm", i get back 4 million results. My parser is edismax, minimum match setting is 2 as of now, and am using WhiteSpace Tokenizer with a bunch of filters. The goal here is to reduce this recall set to display more relevant results.
Things that I explored are -
MinimumMatch - Am trying setting mm to 2>2 4>3 to see how it results. Also tried finding out if i could apply mm on individual fields and found out that it used to be possible with local params in Solr but has been discontinued since Solr 7.2. I do not want to get into writing a custom parser or tweaking Solr's code since that could lead to other problems. Nor do i want to change the default parser to Lucene. Is there any other way that i could apply mm separately to category_name, product_name, product_description, brand_name, etc?
Query slop - Am not using qs as of now, tried a few examples converting my query into phrase query and applying qs. It does reduce recall but i have a problem there. Suppose i have a product which has "apple" in brand_name and "watch series 4 42mm" in product name, that is a relevant result but will not be returned because the phrase query has to have all tokens in the field. Is there a way to apply qs to suit my purpose?
ShingleFilterFactory - I'm trying this filter with outputUnigrams true because i do not want the individual terms to not be indexed. But with that, index size would explode and result set won't be that good either. Can i use other levers like mm or something else along with this to make it work? Also, is there a way to make outputUnigrams a query param?
Explored pf2, pf3, ps also but those will be used for boosting. Right now, my aim is filtering the most relevant results.
Can someone please help me with the above? Thanks
I am running Solr 4.10.3 and trying to resort top 10 documents come from Solr. How can i do that? I am thinking of sub-query but don't know how to do that, needed help.
Example:
Suppose on query of "car" Solr return 250 documents on the basis of high score of relevancy. Now from 250 documents take top 10 documents and resort them on the basis of custom field.
i can't do that:
select?q=car&sort=score desc, pr desc
Because it will do sorting on entire 250 documents. So is there any solution?
I think you mean query reranking ? This is what is used when you want to:
use a first, more lightweight query to get a result based on the score
get a fair top x number of matches, like 2000 for example
use a heavier query to resort only those again, to get the final score
That is what you need to use if you want to do it in two steps, like you state. Now, I am not sure you need to use query reranking for your use case, maybe just boosting by that field, or sorting on a function should be enough for you.
I am trying to do distance range search using Solr.
I know its very easy to do a search for filtering within the 5km range
&q=*:*&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}
What I am after is how to do the same thing if I am looking in a range of say 5 to 10 km ??
Thanks
Here are a couple ways to approach this. So clearly the query goes into a filter query ("fq" param) since the intention is not to modify the score. And lets assume the these parameters are set in the request URL (although they don't have to be placed there):
pt=45.15,-93.85&sfield=store
Here is one approach:
_query_:"{!geofilt d=10}" -_query_:"{!geofilt d=5}"
I used the _query_ Solr syntax hack to enter a sub-query which offers the opportunity to switch the query parser from the Lucene one to a geo one.
Here's another approach that is probably the fastest:
{!frange l=5 u=10}geodist()
This one is a function query returning the distance that is then limited to the desired range. It is probably faster since it will evaluate each document once each instead of twice like the previous will.
You may want to mark this as not cacheable and add a bbox filter so that far fewer then every document is examined. Here is the final result (not url escaped):
pt=45.15,-93.85&sfield=store&fq={!frange l=5 u=10 cache=false cost=100}geodist()&fq={!bbox d=10}
I want to have search results from SOLR ordered like this:
All the documents that have the same score will be ordered descending by date added.
So when I query solr I will have n documents. In this results set there will be groups of documents with the same score. I want each of this group of documents to be ordered descending by date added.
I discovered I can accomplish this using function queries, more exactly using rord function http://wiki.apache.org/solr/FunctionQuery#rord, but as it is stated in the documentation
WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use
since they must use a FieldCache entry at the top level reader, while
sorting and function queries now use entries at the segment level.
Hence sorting or using a different function query, in addition to
ord()/rord() will double memory use.
it will cause excess memory use.
What other options do I have ?
I was thinking to use recip(ms(NOW,startTime),1,1,0). Is this the best approach ?
Is there any negative performance impact if I use recip and ms ?
You can use multiple SORT conditions:
Multiple sort orderings can be separated by a comma, ie: sort=+[,+]...
http://wiki.apache.org/solr/CommonQueryParameters
So, in your case would be:
sort=score DESC, date_added DESC
Since your questions says:
All the documents that have the same score will be ordered descending
by date added.
the other answer you got is perfect.
Anyway, I'd suggest you to make sure that you really want to sort by date only for document with the same score. In my experience this has always been wrong. In fact, the solr score is not absolute but just relative to other documents, and each document is different.
Therefore I wouldn't sort by score and then something else, because it's hard to predict when you'll have the same score for different documents.
I would personally sort only on score and use a function to boost recent documents. You can find a good example on the solr wiki, the function used there is recip(ms(NOW,date_field),3.16e-11,1,1).
If you're worried for performance you can try index time boosting, which should be faster than query time boosting. Have a look here.
I am implementing Solr dismax search and also using this function recip(ms(NOW,PubDate),3.16e-11,1000,1000) for date boost. Everthing is working fine but only got one problem.
if search keywords are repeated in the Title, they get more score than recent results.
e.g.
1) Title = solr lucene
Date = 1 day old
2) Title = solr lucene is best, love solr lucene
Date = 15 days old
If user searched for 'solr lucene', then #2 comes at first position only because keywords are repeated in the Title.
I have got too many records which are1,2 or 3 days old and they have even the exact same title "SOLR LUCENE" but those records doesn't come on first page only because old records have keywords repeated in the Title.
I don't want to sort the results entirely by date. Currently i am sorting it like this. sort= score desc, date asc
You shouldn't use an order clause, if you are using boost.
If you like to give the date more relevance, so pimp your boost function. It's up to you, who big is the date influence for the order of the search result is.
It also depends on the dismax-handler you are using:
{!edismax boost=recip(pow(ms(NOW,PubDate),<val>),3.16e-11,1,1)}
Put an value instead of the <val> placeholder between 0 and 2, where 0 is nearly "order by date" and 2 is order by relevance.
Not sure, if this works for dismax, but it works for standard solr search handler (with other syntax than the example above) and edismax.