I am using SOLR 6.5.1 with facet filters.
My query has:
facet.limit=-1 --> to generate all possible facets values
facet.sort=index --> to order facets values not by number of occurrences but by the value itsef
For instance, one facet has integers as values (in particular the fields contains years). So the values are (occurences in brackets):
2010 (438)
2011 (547)
...
2017 (367)
The facet is correctly ordered by value but with asc order (2010-->2017). How can obtaint the reverse order (2017-->2010)?
Thanks
UMG
You won't be able to specify the sort direction with the simple facet API (the old one used directly in the URL). But since you're retrieving all the possible facets, you can reverse the direction in your client side controller before outputting the values. Exactly how you do that depends on which language you're using.
But if you'd switch over to the more modern JSON-based facet API, you can specify the sort order directly on each level of the facet:
"sort":"index desc"
Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”
Related
I have documents with a contents TextField and a date DateField. I am trying to fetch the most recent documents.
A regular search with 'date > epoch' seems to already sort the
results by date. Is that an expected behavior?
When I try to explicitly sort with SortExpression the results are not even sorted in any particular order (except on the dev server where it seems to work as well).
I am using the following code:
index.search(search.Query(query_string='date > epoch',
options=search.QueryOptions(
sort_options=search.SortOptions(
expressions=[search.SortExpression(
expression='date',
direction=search.SortExpression.DESCENDING,
default_value='1970-01-01')])))
What is the right way to do that?
According to the documentation, all documents are sorted by their rank unless you specify a different sorting option. And a document rank is set to the time when it was added to the index, again, unless you specify a different rank.
If this is your desired behavior, there is no need to add a date field and sort by it.
When you filter by a field your are forcing to use index and as side result the output will be sorted by that field.
https://cloud.google.com/appengine/docs/python/search/options
When you call the search() method using a query string alone, the
results are returned according to the default query options:
Documents are returned sorted in order of descending rank
Documents are returned in groups of 20 at a time
Retrieved documents contain all of their original fields
Don't know why it does not work with explicit sorting options.
In Solr , I am fetching results using groupBy on "hash" ( my custom field ) field.
As we know each group will contains a set of documents.
My requirement is:
Solr first do a sorting based on score , that it is already doing.
If score of any two groups is same , then group with more number of documents should come up.
If even number of documents are same , then there should be some tie-breaker.
I need guidance for point 2 and 3. I am not able to get how to do it using 'sort' parameter.
Thanks
Amit Aggarwal
2)group with more number of documents should come up
There is no way to do it. Alternatively, you can use two queries to achieve this. Facets. They are, by default, sorted by numfound. And they loop through facet results to get results per facet.
I am using SOLR and storing an array of dates a salesperson is available to visit clients (trips can last anywhere from a day upwards, depending on the client request). For each salesperson I have a list of dates that they are available for the salesperson for a given month. There are other fields, including salesperson data, geolocation information, etc.
I am familiar with range queries but it seems that SOLRs range searches on arrays work differently than I would like - as long as any item in the array is a match then the range is a match). I would like to send SOLR a query with a range and only return a match if all dates in that range are found in the array. For example:
<arr name="available_dates">
<date>2012-04-30T00:00:00Z</date>
<date>2012-05-01T00:00:00Z</date>
<date>2012-05-02T00:00:00Z</date>
</arr>
-- should match --
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
-- should not match as 2012-04-29 is not contained in available_dates --
available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
Is this possible or am I going about this all wrong?
You have the right idea, but your initial query is a search instead of a match. Intuitively, your search within available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z] should contain all of the elements of available_dates for it to have matched successfully.
You have two options to implement this logic efficiently and successfully. You can either manually or dynamically perform the range query for each element in your array, or you can set up an ancillary that attempts to perform the match after your search has been performed. For example:
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z](available_dates)
Which is saying, in left to right order: evaluate the range search, then check that all of the results from available_dates are contained in this evaluation (by way of a default AND query). If they are, return the element. If not, don't.
Syntactically, the above is untested and probably does not work. But procedurally, you should be able to draft the right query around this to fit your needs.
(Additional resource discussing the default AND behavior of composite search queries)
Instead of using a range query you should use multiple clauses, one for each date.
So instead of available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
You should use available_dates:"2012-04-29T00:00:00.000Z" AND available_dates:"2012-04-30T00:00:00Z" AND available_dates:"2012-05-01T00:00:00.000Z" AND available_dates:"2012-05-02T00:00:00.000Z"
Hope that answers your question!
Assuming you're importing this data from database.
In your database or in your search index, create a new column that stores the max of your sales person's date (as in latest date), as well as a min. Also, calculate and store the difference between the max & min date.
Three criterias must be matched for a matching query (so use AND in the query)
the differnce between the query's max & min can't be bigger than the difference as stored in the index
you'd make sure {!frange l=0 u=difn_bet_query_max_and_min}sub(field_min,query_min)
formulate the same thing for your max values
For a reference on function ranges
http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/
I want to have search results from SOLR ordered like this:
All the documents that have the same score will be ordered descending by date added.
So when I query solr I will have n documents. In this results set there will be groups of documents with the same score. I want each of this group of documents to be ordered descending by date added.
I discovered I can accomplish this using function queries, more exactly using rord function http://wiki.apache.org/solr/FunctionQuery#rord, but as it is stated in the documentation
WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use
since they must use a FieldCache entry at the top level reader, while
sorting and function queries now use entries at the segment level.
Hence sorting or using a different function query, in addition to
ord()/rord() will double memory use.
it will cause excess memory use.
What other options do I have ?
I was thinking to use recip(ms(NOW,startTime),1,1,0). Is this the best approach ?
Is there any negative performance impact if I use recip and ms ?
You can use multiple SORT conditions:
Multiple sort orderings can be separated by a comma, ie: sort=+[,+]...
http://wiki.apache.org/solr/CommonQueryParameters
So, in your case would be:
sort=score DESC, date_added DESC
Since your questions says:
All the documents that have the same score will be ordered descending
by date added.
the other answer you got is perfect.
Anyway, I'd suggest you to make sure that you really want to sort by date only for document with the same score. In my experience this has always been wrong. In fact, the solr score is not absolute but just relative to other documents, and each document is different.
Therefore I wouldn't sort by score and then something else, because it's hard to predict when you'll have the same score for different documents.
I would personally sort only on score and use a function to boost recent documents. You can find a good example on the solr wiki, the function used there is recip(ms(NOW,date_field),3.16e-11,1,1).
If you're worried for performance you can try index time boosting, which should be faster than query time boosting. Have a look here.
I am searching "product documents". In other words, my solr documents are product records. I want to get say the top 50 matching products for a query. Then I want to be able to sort the top 50 scoring documents by name or price. I'm not seeing much on how to do this, since sorting by score, then by name or price won't really help, since scores are floats.
I wouldn't mind if I could do something like map the scores to ranges (like a score of 8.0-8.99 would go in the 8 bucket score), then sort by range, then by names, but since there is basically no normalization to scoring, this would still make things a bit harder.
Tl;dr How do I exclude low scoring documents from the solr result set before sorting?
You can use frange to achieve this, as long as you don't want to sort on score (in which case I guess you could just do the filtering on the client side).
Your query would be something along the lines of:
q={!frange l=5}query($qq)&qq=[awesome product]&sort=price asc
Set the l argument in the q-frange-parameter to the lower bound you want to filter score on, and replace the qq parameter with your user query.
As observed by Karl Johansson, you could do the filtering on the client side: load the first 50 rows of the response (sorted by score desc) and then manipulate them in JS for example.
The jQuery DataTables plugin works fantastically for that kind of thing: sorting, sorting on multiple columns, dynamic filtering, etc. -- and with only 50 rows it would be very fast too, so that users can "play" with the sorting and filtering until they find what they want.
I don't think you can simply
exclude low scoring documents from the
solr result set before sorting
because the relevance score is only meaningful for a given combination of search query and resulting document list. I.e. scores are only meaningful within a given search and you cannot set some threshold for all searches.
If you were using Java (or PHP) you could get the top 50 documents and then re-sort this list in your programming language but I don't think you can do it with just SOLR.
Anyway, I would recommend you don't go down this route of re-sorting the results from SOLR, as it will simply confuse the user. People expect search results to be like Google (and most other search engines), where results come back in some form of TFIDF ranking.
Having said that, you could use some other criteria to separate documents with the same relevance scores by adding an index-time boost factor based on a price range scale.
I'd suggest you use SOLR to its strengths and use facets. Provide a price range facet on the left (like Ebay, Amazon, et al.) and/or a product category facet, etc. Also provide a "sort" widget to allow the results to be sorted by product name, if the user wants it.
[EDIT] this question might also be useful:
Digg-like search result ranking with Lucene / Solr?