I have documents with a contents TextField and a date DateField. I am trying to fetch the most recent documents.
A regular search with 'date > epoch' seems to already sort the
results by date. Is that an expected behavior?
When I try to explicitly sort with SortExpression the results are not even sorted in any particular order (except on the dev server where it seems to work as well).
I am using the following code:
index.search(search.Query(query_string='date > epoch',
options=search.QueryOptions(
sort_options=search.SortOptions(
expressions=[search.SortExpression(
expression='date',
direction=search.SortExpression.DESCENDING,
default_value='1970-01-01')])))
What is the right way to do that?
According to the documentation, all documents are sorted by their rank unless you specify a different sorting option. And a document rank is set to the time when it was added to the index, again, unless you specify a different rank.
If this is your desired behavior, there is no need to add a date field and sort by it.
When you filter by a field your are forcing to use index and as side result the output will be sorted by that field.
https://cloud.google.com/appengine/docs/python/search/options
When you call the search() method using a query string alone, the
results are returned according to the default query options:
Documents are returned sorted in order of descending rank
Documents are returned in groups of 20 at a time
Retrieved documents contain all of their original fields
Don't know why it does not work with explicit sorting options.
Related
I'm trying to change sorting from apache Solr query.
for example bundle type: story, videogallery and category_management are indexed.
I wanted to show all results related to bundle type: category_management on top.
See attached screenshot:
Please help me to user Solr filter to sort my result.
The easiest way is to apply a boost to any entry that matches your requirement:
&bq=bundle:category_management^50
The weight - 50 - may have to be adjusted to get the result you want. This is faster than sorting by a function.
This will also still keep relevancy score inside each set of documents, compared to sorting by a function or adding a priority field for sorting.
If you want to actually apply a sort on multiple fields instead, you can first sort by a function that returns 1 for values that match and 0 for values that doesn't. Something like:
sort=if(termfreq(bundle,'category_management'),1,0),ds_changed desc
I am using SOLR 6.5.1 with facet filters.
My query has:
facet.limit=-1 --> to generate all possible facets values
facet.sort=index --> to order facets values not by number of occurrences but by the value itsef
For instance, one facet has integers as values (in particular the fields contains years). So the values are (occurences in brackets):
2010 (438)
2011 (547)
...
2017 (367)
The facet is correctly ordered by value but with asc order (2010-->2017). How can obtaint the reverse order (2017-->2010)?
Thanks
UMG
You won't be able to specify the sort direction with the simple facet API (the old one used directly in the URL). But since you're retrieving all the possible facets, you can reverse the direction in your client side controller before outputting the values. Exactly how you do that depends on which language you're using.
But if you'd switch over to the more modern JSON-based facet API, you can specify the sort order directly on each level of the facet:
"sort":"index desc"
Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”
I am using SOLR and storing an array of dates a salesperson is available to visit clients (trips can last anywhere from a day upwards, depending on the client request). For each salesperson I have a list of dates that they are available for the salesperson for a given month. There are other fields, including salesperson data, geolocation information, etc.
I am familiar with range queries but it seems that SOLRs range searches on arrays work differently than I would like - as long as any item in the array is a match then the range is a match). I would like to send SOLR a query with a range and only return a match if all dates in that range are found in the array. For example:
<arr name="available_dates">
<date>2012-04-30T00:00:00Z</date>
<date>2012-05-01T00:00:00Z</date>
<date>2012-05-02T00:00:00Z</date>
</arr>
-- should match --
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
-- should not match as 2012-04-29 is not contained in available_dates --
available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
Is this possible or am I going about this all wrong?
You have the right idea, but your initial query is a search instead of a match. Intuitively, your search within available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z] should contain all of the elements of available_dates for it to have matched successfully.
You have two options to implement this logic efficiently and successfully. You can either manually or dynamically perform the range query for each element in your array, or you can set up an ancillary that attempts to perform the match after your search has been performed. For example:
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z](available_dates)
Which is saying, in left to right order: evaluate the range search, then check that all of the results from available_dates are contained in this evaluation (by way of a default AND query). If they are, return the element. If not, don't.
Syntactically, the above is untested and probably does not work. But procedurally, you should be able to draft the right query around this to fit your needs.
(Additional resource discussing the default AND behavior of composite search queries)
Instead of using a range query you should use multiple clauses, one for each date.
So instead of available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
You should use available_dates:"2012-04-29T00:00:00.000Z" AND available_dates:"2012-04-30T00:00:00Z" AND available_dates:"2012-05-01T00:00:00.000Z" AND available_dates:"2012-05-02T00:00:00.000Z"
Hope that answers your question!
Assuming you're importing this data from database.
In your database or in your search index, create a new column that stores the max of your sales person's date (as in latest date), as well as a min. Also, calculate and store the difference between the max & min date.
Three criterias must be matched for a matching query (so use AND in the query)
the differnce between the query's max & min can't be bigger than the difference as stored in the index
you'd make sure {!frange l=0 u=difn_bet_query_max_and_min}sub(field_min,query_min)
formulate the same thing for your max values
For a reference on function ranges
http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/
I want to have search results from SOLR ordered like this:
All the documents that have the same score will be ordered descending by date added.
So when I query solr I will have n documents. In this results set there will be groups of documents with the same score. I want each of this group of documents to be ordered descending by date added.
I discovered I can accomplish this using function queries, more exactly using rord function http://wiki.apache.org/solr/FunctionQuery#rord, but as it is stated in the documentation
WARNING: as of Solr 1.4, ord() and rord() can cause excess memory use
since they must use a FieldCache entry at the top level reader, while
sorting and function queries now use entries at the segment level.
Hence sorting or using a different function query, in addition to
ord()/rord() will double memory use.
it will cause excess memory use.
What other options do I have ?
I was thinking to use recip(ms(NOW,startTime),1,1,0). Is this the best approach ?
Is there any negative performance impact if I use recip and ms ?
You can use multiple SORT conditions:
Multiple sort orderings can be separated by a comma, ie: sort=+[,+]...
http://wiki.apache.org/solr/CommonQueryParameters
So, in your case would be:
sort=score DESC, date_added DESC
Since your questions says:
All the documents that have the same score will be ordered descending
by date added.
the other answer you got is perfect.
Anyway, I'd suggest you to make sure that you really want to sort by date only for document with the same score. In my experience this has always been wrong. In fact, the solr score is not absolute but just relative to other documents, and each document is different.
Therefore I wouldn't sort by score and then something else, because it's hard to predict when you'll have the same score for different documents.
I would personally sort only on score and use a function to boost recent documents. You can find a good example on the solr wiki, the function used there is recip(ms(NOW,date_field),3.16e-11,1,1).
If you're worried for performance you can try index time boosting, which should be faster than query time boosting. Have a look here.
I have a solr index with the unique field as "id".
I have a ordered set of ids, using which I would like to query Solr. But I want the results in the same order.
so for example if i have the ids id = [5,1,3,4] I want the results displayed in solr in same order.
I tried http://localhost:8983/solr/select/?q=id:(5 OR 1 OR 3 OR 4)&fl=id, but the results displayed are in ascending order.
Is their a way to query solr, and get results as I mentioned?
I think you can't,
The results appear in the order they are indexed unless you specify a default sort field or the explicit sort field/order.
You can add another field to keep the initial sort order. You then can sort=field asc to retrieve the data in the original order.
The simple way is to query solr and sort the results in codes of yourself.