SOLR - Match range query only if all dates in range are matched - solr

I am using SOLR and storing an array of dates a salesperson is available to visit clients (trips can last anywhere from a day upwards, depending on the client request). For each salesperson I have a list of dates that they are available for the salesperson for a given month. There are other fields, including salesperson data, geolocation information, etc.
I am familiar with range queries but it seems that SOLRs range searches on arrays work differently than I would like - as long as any item in the array is a match then the range is a match). I would like to send SOLR a query with a range and only return a match if all dates in that range are found in the array. For example:
<arr name="available_dates">
<date>2012-04-30T00:00:00Z</date>
<date>2012-05-01T00:00:00Z</date>
<date>2012-05-02T00:00:00Z</date>
</arr>
-- should match --
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
-- should not match as 2012-04-29 is not contained in available_dates --
available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
Is this possible or am I going about this all wrong?

You have the right idea, but your initial query is a search instead of a match. Intuitively, your search within available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z] should contain all of the elements of available_dates for it to have matched successfully.
You have two options to implement this logic efficiently and successfully. You can either manually or dynamically perform the range query for each element in your array, or you can set up an ancillary that attempts to perform the match after your search has been performed. For example:
available_dates:[2012-04-30T00:00:00.000Z TO 2012-05-02T00:00:00.000Z](available_dates)
Which is saying, in left to right order: evaluate the range search, then check that all of the results from available_dates are contained in this evaluation (by way of a default AND query). If they are, return the element. If not, don't.
Syntactically, the above is untested and probably does not work. But procedurally, you should be able to draft the right query around this to fit your needs.
(Additional resource discussing the default AND behavior of composite search queries)

Instead of using a range query you should use multiple clauses, one for each date.
So instead of available_dates:[2012-04-29T00:00:00.000Z TO 2012-05-02T00:00:00.000Z]
You should use available_dates:"2012-04-29T00:00:00.000Z" AND available_dates:"2012-04-30T00:00:00Z" AND available_dates:"2012-05-01T00:00:00.000Z" AND available_dates:"2012-05-02T00:00:00.000Z"
Hope that answers your question!

Assuming you're importing this data from database.
In your database or in your search index, create a new column that stores the max of your sales person's date (as in latest date), as well as a min. Also, calculate and store the difference between the max & min date.
Three criterias must be matched for a matching query (so use AND in the query)
the differnce between the query's max & min can't be bigger than the difference as stored in the index
you'd make sure {!frange l=0 u=difn_bet_query_max_and_min}sub(field_min,query_min)
formulate the same thing for your max values
For a reference on function ranges
http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/

Related

Solr - execute different query based on condition

I have 50 value fields, 50 booleans and a date field. To be compliant to the new GDPR standard, I need to make certain fields unsearchable. The difficult part in my case is, the fields that need to be unsearchable differ per record. So in one case field 2 and 5 might be protected, while in the other field 3 and 7 are protected. This is known by the booleans: every value field also has a boolean that defines if that field is protected or not.
All this only applies when the date field is still in the future. When the date is in the past, or there is no date at all, all fields of that record are searchable anyway, regardless of the booleans.
What I had in mind is execute a different query per record, based on whether or not the date field of that record is in the future.
if (date > today) -> query1
else -> query2
Where query1 checks every field individually, taking into account the matching boolean. Is this possible, and how?
For the first condition - use separate fields for searching before and after the date has passed (if you can still store the value - i'm not not too familiar with detailed GDPR requirements).
I.e. have field_1, field_1_before_date - and only submit a value for field_1_before_date if your boolean value is true when indexing the document.
Issue two separate queries, one to get documents in the future and one to get documents in the past - in the first one you limit the fields you query to field_1_before_date, while in the second one you use field_1 instead.
You can combine these using _query_ - Using nested queries in Solr:
q=yourfirstquery OR _query_:"your second query"
.. should work, unless there is a limitation to combining those using OR.

SORL facet fields order by descending value

I am using SOLR 6.5.1 with facet filters.
My query has:
facet.limit=-1 --> to generate all possible facets values
facet.sort=index --> to order facets values not by number of occurrences but by the value itsef
For instance, one facet has integers as values (in particular the fields contains years). So the values are (occurences in brackets):
2010 (438)
2011 (547)
...
2017 (367)
The facet is correctly ordered by value but with asc order (2010-->2017). How can obtaint the reverse order (2017-->2010)?
Thanks
UMG
You won't be able to specify the sort direction with the simple facet API (the old one used directly in the URL). But since you're retrieving all the possible facets, you can reverse the direction in your client side controller before outputting the values. Exactly how you do that depends on which language you're using.
But if you'd switch over to the more modern JSON-based facet API, you can specify the sort order directly on each level of the facet:
"sort":"index desc"
Specifies how to sort the buckets produced. “count” specifies document count, “index” sorts by the index (natural) order of the bucket value. One can also sort by any facet function / statistic that occurs in the bucket. The default is “count desc”. This parameter may also be specified in JSON like sort:{count:desc}. The sort order may either be “asc” or “desc”

Sort or filter results by function query defined in a field

I have a Solr 6.2 instance running, and I'm exploring its advantages and limitations. One limitation I've run into seems to be that you can't sort or filter the data based off of a field function query.
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq=total:[10 TO *]
Solr responds with an error stating that the total field does not exist. Indeed, the field is not defined in my schema because it's not a stored part of the dataset - it's calculated at query time. They call it a pseudo field. I haven't been able to find an example in the documentation or a solution online. So, is there a way around this?
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq={!frange l=10} sum(v1,v2)
I have very same problem as you.
I want to query particular division value of two fields.
I tried to used [0.3 TO *] like you.
You can also use upper bound for your range if you need.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.6.pdf
"l" is for lower bound.
"u" is for upper bound.
fq={!frange l=0 u=2.2} sum(user_ranking,editor_ranking)
Maybe this works for you?
you can do this. instead of total try sum.
you can find more info here. https://wiki.apache.org/solr/FunctionQuery#What_is_a_Function.3F
an example from the sole wiki.
Example Function Queries
To give you a better understanding of how function queries can be used in Solr, suppose an index stores the dimensions in meters x,y,z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search for box matching name findbox but ranked according to volumes of boxes. The query parameters would be:
q=boxname:findbox val:"product(x,y,z)"
This query will rank the results based on volumes. In order to get the computed volume, you will need to request the score, which will contain the resultant volume:
&fl=*, score
Suppose that you also have a field storing the weight of the box as weight. To sort by the density of the box and return the value of the density in score, you would submit the following query:
http://localhost:8983/solr/collection_name/select?q=boxname:findbox val:"div(weight,product(x,y,z))"&fl=boxname x y z weight score`
you can read more about it here. https://cwiki.apache.org/confluence/display/solr/Function+Queries
Try this
solr/collection/select?q=*:* _val_:"sum(v1,v2)"&fl=* score&fq={!frange l=10 }sum(v1,v2)

Sorting by date with AppEngine search API

I have documents with a contents TextField and a date DateField. I am trying to fetch the most recent documents.
A regular search with 'date > epoch' seems to already sort the
results by date. Is that an expected behavior?
When I try to explicitly sort with SortExpression the results are not even sorted in any particular order (except on the dev server where it seems to work as well).
I am using the following code:
index.search(search.Query(query_string='date > epoch',
options=search.QueryOptions(
sort_options=search.SortOptions(
expressions=[search.SortExpression(
expression='date',
direction=search.SortExpression.DESCENDING,
default_value='1970-01-01')])))
What is the right way to do that?
According to the documentation, all documents are sorted by their rank unless you specify a different sorting option. And a document rank is set to the time when it was added to the index, again, unless you specify a different rank.
If this is your desired behavior, there is no need to add a date field and sort by it.
When you filter by a field your are forcing to use index and as side result the output will be sorted by that field.
https://cloud.google.com/appengine/docs/python/search/options
When you call the search() method using a query string alone, the
results are returned according to the default query options:
Documents are returned sorted in order of descending rank
Documents are returned in groups of 20 at a time
Retrieved documents contain all of their original fields
Don't know why it does not work with explicit sorting options.

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

Resources