Solr Distance Filtering - solr

I am trying to do distance range search using Solr.
I know its very easy to do a search for filtering within the 5km range
&q=*:*&fq={!geofilt pt=45.15,-93.85 sfield=store d=5}
What I am after is how to do the same thing if I am looking in a range of say 5 to 10 km ??
Thanks

Here are a couple ways to approach this. So clearly the query goes into a filter query ("fq" param) since the intention is not to modify the score. And lets assume the these parameters are set in the request URL (although they don't have to be placed there):
pt=45.15,-93.85&sfield=store
Here is one approach:
_query_:"{!geofilt d=10}" -_query_:"{!geofilt d=5}"
I used the _query_ Solr syntax hack to enter a sub-query which offers the opportunity to switch the query parser from the Lucene one to a geo one.
Here's another approach that is probably the fastest:
{!frange l=5 u=10}geodist()
This one is a function query returning the distance that is then limited to the desired range. It is probably faster since it will evaluate each document once each instead of twice like the previous will.
You may want to mark this as not cacheable and add a bbox filter so that far fewer then every document is examined. Here is the final result (not url escaped):
pt=45.15,-93.85&sfield=store&fq={!frange l=5 u=10 cache=false cost=100}geodist()&fq={!bbox d=10}

Related

Display Solr results based on custom selection

I have a filed 'qualification' which is having multiple values (something like MCA, MBA, MSC, PhD, ...).
My requirement is to display results in the order MSC, MCA, PhD, MBA. So, I am using the below query to boost the field values.
&bq=(qualification: "MSC"^5 "MCA"^4 "PhD"^3 "MBA"^2)
The above query is working only when I use q=*:*
But when search with any text like q=course, I am not getting the results with specified order.
Please help what I did wrong.
Thanks & Regards
Venu
You're probably not doing anything "wrong", but when you actually search for something, the score isn't flat (i.e. it's no longer just 1) any more.
If you don't want your query to affect the score, use a filter query (fq instead). This does however not give you any actual relevance inside the results - if you still want that, you'll probably have to adjust your boosts to be far higher, so that the actual scores are only used internally within each boost level.
&bq=qualification:"MSC"^50000
&bq=qualification:"MCA"^40000
&bq=qualification:"PhD"^30000
&bq=qualification:"MBA"^20000
If you append debugQuery=true to your query string, you can see how the score is calculated for each document, and adjust your boosts accordingly.

Sort or filter results by function query defined in a field

I have a Solr 6.2 instance running, and I'm exploring its advantages and limitations. One limitation I've run into seems to be that you can't sort or filter the data based off of a field function query.
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq=total:[10 TO *]
Solr responds with an error stating that the total field does not exist. Indeed, the field is not defined in my schema because it's not a stored part of the dataset - it's calculated at query time. They call it a pseudo field. I haven't been able to find an example in the documentation or a solution online. So, is there a way around this?
.../solr/collection/select?q=*:*&fl=*,total:sum(v1,v2)&fq={!frange l=10} sum(v1,v2)
I have very same problem as you.
I want to query particular division value of two fields.
I tried to used [0.3 TO *] like you.
You can also use upper bound for your range if you need.
http://archive.apache.org/dist/lucene/solr/ref-guide/apache-solr-ref-guide-4.6.pdf
"l" is for lower bound.
"u" is for upper bound.
fq={!frange l=0 u=2.2} sum(user_ranking,editor_ranking)
Maybe this works for you?
you can do this. instead of total try sum.
you can find more info here. https://wiki.apache.org/solr/FunctionQuery#What_is_a_Function.3F
an example from the sole wiki.
Example Function Queries
To give you a better understanding of how function queries can be used in Solr, suppose an index stores the dimensions in meters x,y,z of some hypothetical boxes with arbitrary names stored in field boxname. Suppose we want to search for box matching name findbox but ranked according to volumes of boxes. The query parameters would be:
q=boxname:findbox val:"product(x,y,z)"
This query will rank the results based on volumes. In order to get the computed volume, you will need to request the score, which will contain the resultant volume:
&fl=*, score
Suppose that you also have a field storing the weight of the box as weight. To sort by the density of the box and return the value of the density in score, you would submit the following query:
http://localhost:8983/solr/collection_name/select?q=boxname:findbox val:"div(weight,product(x,y,z))"&fl=boxname x y z weight score`
you can read more about it here. https://cwiki.apache.org/confluence/display/solr/Function+Queries
Try this
solr/collection/select?q=*:* _val_:"sum(v1,v2)"&fl=* score&fq={!frange l=10 }sum(v1,v2)

Solr negative boost

I'm looking into the possibility of de-boosting a set of documents during
query time. In my application, when I search for e.g. "preferences", I want
to de-boost content tagged with ContentGroup:"Developer" or in other words,
push those content back in the order. Here's the catch. I've the following
weights on query fields and boost query on source
qf=text^6 title^15 IndexTerm^8
As you can see, title has a higher weight.
Now, a bunch of content tagged with ContentGroup:"Developer" consists of a
title like "Preferences.material" or "Preferences Property" or
"Preferences.graphics". The boost on title pushes these documents at the
top.
What I'm looking is to see if there's a way to deboost all documents that are
tagged with ContentGroup:"Developer" irrespective of the term occurrence is
text or title. I tried something like, but didn't make any difference.
Source:simplecontent^10 Source:Help^20 (-ContentGroup-local:("Developer"))^99
I'm using edismax query parser.
Any pointers will be appreciated.
Thanks,
Shamik
You're onto something with your last attempt, but you have to start with *:*, so that you actually have something to subtract the documents from. The resulting set of documents (those not matching your query) can then be boosted.
From the Solr Relevancy FAQ
How do I give a negative (or very low) boost to documents that match a query?
True negative boosts are not supported, but you can use a very "low" numeric boost value on query clauses. In general the problem that confuses people is that a "low" boost is still a boost, it can only improve the score of documents that match. For example, if you want to find all docs matching "foo" or "bar" but penalize the scores of documents matching "xxx" you might be tempted to try...
q = foo^100 bar^100 xxx^0.00001 # NOT WHAT YOU WANT
...but this will still help a document matching all three clauses score higher then a document matching only the first two. One way to fake a "negative boost" is to give a large boost to everything that does not match. For example...
q = foo^100 bar^100 (*:* -xxx)^999
NOTE: When using (e)dismax, people sometimes expect that specifying a pure negative query with a large boost in the "bq" param will work (since Solr automatically makes top level purely negative positive queries by adding an implicit ":" -- but this doesn't work with "bq", because of how queries specified via "bq" are added directly to the main query. You need to be explicit...
?defType=dismax&q=foo bar&bq=(*:* -xxx)^999

Can SOLR/Lucene report calculated score of extra named documents, even if they're not in top N results?

I'd like to submit a query to SOLR/Lucene, plus a list of document IDs. From the query, I'd like the usual top-N scored results, but I'd also like to get the scores for the named documents... no matter how low they are.
Can anyone think of an easy/supported way to do this in a single index scan, where the scores for the 'added' (non-ranking/pinned-for-inclusion) docs are comparable/same-scaled as those for the top-N results? (Patching SOLR with specialized classes would be OK; I figure that's what I may have to do if there's no existing support.)
Or failing that, could it be simulated with a followup query, ideally in a way that the named-document scores could be scaled to be roughly comparable to the top-N for the reference query?
Alternatively -- and perhaps as good or better for my intended use -- could I make a single request against a SOLR/Lucene index which includes M (with M=2 or more) distinct queries, and return the results that are in the top-N for any of the M queries, and for every result include its score against all M of the distinct queries?
(Even in my above formulation, the list of documents that I want scored along with a new query will typically have been the results from a prior query.)
Solutions or even just fragments of possible approaches appreciated!
I am not sure if I understand properly what you want to achieve but wouldn't a simple
q: (somequery) OR id: (1 OR 2 OR 4)
be enough?
If you would want both parts to be boosted by the same scale (I am not sure if this isn't the default behaviour of Solr) you would want to use dismax or edismax and your query would change to something like:
q: (somequery)^10 OR id: (1 OR 2 OR 4)^10
You would then have both the elements defined by the IDs and the query results scored the same way.
To self-answer, reporting what I've found since posting...
One clumsy option is the explainOther parameter, which takes another query. (This query could be a OR list of interesting document IDs.) The response will then include a full scoring explanation for documents which match this other query. explainOther only has effect when combined with the also-required debugQuery parameter.
All that debug/explain information is overkill for the need, but may be useful, or the code paths that implement it might provide a guide to making a hypothetical new more narrowly-focused 'scoreOther' option.
Another option would be to make use of pseudo-field calculated using the query() function to report how any set of results score on some other query/queries. So if for example the original document set was the top-N from query_A, and then those are the exact documents that you also want to score against query_B, you would execute query_A again with a reporting-field …&fl=bscore:query({!dismax v="query_B"})&…. Then the document's scores against query_B would be included in the output (as bscore).
Finally, the result-grouping functionality can be used both collect the top-N for one query and scores for lesser documents intersecting with other queries in one go. For example, if querying for query_B and adding …&group=true&group.query=query_B&group.query=query_A&…, you'll get back groups that satisfy query_B (ranked by query_B), and that satisfy both query_B and query_A (but again ranked by query_B). This could be mixed with the functional field above to get the scores by another query (like query_A) as well.
However, all groups will share the same sort order (from either the master query or something specified by a group.sort parameter), so it's not currently possible (SOLR-4.0.0-beta) to get several top-N results according to different scorings, just the top-Ns according to one scoring, limited by certain groups. (There's a comment in the source code suggesting alternate sorts per group may be envisioned as a future capability.)

solr sort,i want Specify a particular document at the first

solr sort,i want Specify a particular document at the first
for example:
Results :5,2,3,1
I want 2 at the first ,Other sorted in accordance with the rules
2,1,3,5
how to do this ?
I know of two ways you can try to tackle this using Solr.
The first is to use the QueryElevationComponent. This lets you define the top results at index time. As suggested in the documentation, this is good for placing sponsored results or popular documents at the top of the search results. The potential downside is that you have to be able to identify those documents at index time and not at query time.
The other approach is to boost the desired documents at query time using the bq parameter. To boost document 435, you would do something like this:
...&bq=id:435^10
Unfortunately, neither of these approaches give you absolute control over the order of the results.
The solution provided by Riking would certainly do the job if you don't mind processing the results after performing the search. Another approach you could consider is to add a field to your Solr schema that defines a display order or priority. You can then sort on that field to get the desired sort order.
If you are using Solr 3.1 or later, you can sort by a function query. The map function is useful for this.
sort=map(field_name,5,5,0) asc
In the above, field_name is the name of the field you want to sort by, 5 is the value you want to push to the front and 0 must be replaced with some number that you know is less than all other numbers.
Call the builtin sort() function, then shift the desired element to the front.
Pseudocode, in case you do not have a builtin method to shift it to the front:
tmp = desired;
int dIndex = array.indexOf(desired);
for(i=dIndex-1; i >= 0; i--)
{
array[i+1] = array[i]
}
In case you use standart query (not dismax) add "OR id:2^1000" to you query. Like this:
q=(text:lalala AND author:Bob) OR id:2^1000
that will place document with ID=2 at the top of results.

Resources