Solr get calculated distance while using dismax - solr

I'm starting to think that what I want to do is not possible but thought I would give this a try.
I'm running Solr 3.5.
I currently have two types of search:
A basic spatial query which returns the calulated distance between two points in the score field.
Sample Query from my Solr logs:
?fl=*,score&sort=score+asc&start=0&q={!func}geodist()&sfield=coordinates&pt=59.2363514,18.092783&version=2
A dismax query which allows free text queries on a number of fields.
Sample Query from Solr log:
mm=1&d=100.0&sfield=coordinates&qf=field1^5.0+fields2^3.0&defType=edismax&version=2&fl=*,score&start=1&q=monkeyhopper&pt=59.2363514,18.0927830000&fq={!geofilt}}
I want to replace my first query with the dismax query but I really need to get the calculated distance in the response. Yes, I can calulate the distance programatically but I would prefer not having to do this as Solr has done it for me already.
I still want to be able to sort my dismax query "by relevance", distance or any other field so the score given by my boosts could be interesting for sorting but I don't need it to be returned.

If I understood correctly you want to have the result of a function in your Solr response. The SOLR-2444 issue is what you're looking for I guess: it allows to include in the fl parameter pseudo-fields, functions etc. The only problem is that it's been committed only on trunk, so it isn't available on the current Solr release, neither will be in the coming 3.6 release. You have to wait for the 4 release but I don't think it will take a lot of time. Maybe you can already start playing around with a snapshot of the last successful Jenkins build.

Pseudo-fields are now available in Solr 4+ which allow you to do just this.
http://localhost:8983/solr/collection1/browse?q=*:*&rows=1000&wt=xml&pt=37.763649,-122.24313&sfield=store&fl=dist:geodist()
For instance, this request allows me to return a field "dist" which contains the distance of each entry to the stated point.

Related

Solr 8.8.2 reduce recall and improve precision for multi token queries - mm, qs, shingles

I'm facing a issue wherein I have huge amount of data in Solr and as a result, searching for a multi token query is generating a big recall set. For ex - if i search for "apple watch series 4 42mm", i get back 4 million results. My parser is edismax, minimum match setting is 2 as of now, and am using WhiteSpace Tokenizer with a bunch of filters. The goal here is to reduce this recall set to display more relevant results.
Things that I explored are -
MinimumMatch - Am trying setting mm to 2>2 4>3 to see how it results. Also tried finding out if i could apply mm on individual fields and found out that it used to be possible with local params in Solr but has been discontinued since Solr 7.2. I do not want to get into writing a custom parser or tweaking Solr's code since that could lead to other problems. Nor do i want to change the default parser to Lucene. Is there any other way that i could apply mm separately to category_name, product_name, product_description, brand_name, etc?
Query slop - Am not using qs as of now, tried a few examples converting my query into phrase query and applying qs. It does reduce recall but i have a problem there. Suppose i have a product which has "apple" in brand_name and "watch series 4 42mm" in product name, that is a relevant result but will not be returned because the phrase query has to have all tokens in the field. Is there a way to apply qs to suit my purpose?
ShingleFilterFactory - I'm trying this filter with outputUnigrams true because i do not want the individual terms to not be indexed. But with that, index size would explode and result set won't be that good either. Can i use other levers like mm or something else along with this to make it work? Also, is there a way to make outputUnigrams a query param?
Explored pf2, pf3, ps also but those will be used for boosting. Right now, my aim is filtering the most relevant results.
Can someone please help me with the above? Thanks

Displaying information about SolR search result

I am struggling with a little problem where I have to display relevant information about the resultset returned from SolR but can't figure out how to calculate it without iterating the results (bad).
Basically I am storing my documents with a state field and while the search is supposed to return all documents, the UI has to show "Found 15 entities, 5 are in state A, 3 in state B and 8 in C".
At the moment I am using a rather brittle approach of running the query 3 times with additional scoping by type, but I'd rather get that information from the one query I am displaying. (There have been some edge cases where the numbers don't add up and since SolR can return facets I guess there has to be a way to use that functionality in this case)
I am using SolR 3.5 from Rails with the sunspot gem
As you mention yourself, you can use facets for this by setting
facet=true&facet.field=state
I'm not familiar with the sunspot gem, but by looking at the documentation you can use
facets like this(Assuming Entity is your searchable):
Entity.search do:
facet :state
end
This should return the states of all entities returned by your query with the number of entities in this state. The Sunspot documentation tells me you can read these facets in the following way:
search.facet(:state).rows.each do |facet|
puts "State #{facet.value} has #{facet.count} entities"
end
Essentially there are three main sets of functions you can use to garner stats from solr.
The first is faceting:
http://wiki.apache.org/solr/SimpleFacetParameters
There is also grouping (field collapsing):
https://wiki.apache.org/solr/FieldCollapsing
And the stats package:
https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
Although the stats, facet and group may be replaced by the analytic package known as olap which is aimed to be in solr V 5.0.0:
https://issues.apache.org/jira/browse/SOLR-5302
Good luck.

How do I override Solr's relevancy in a query

I am integrating a chemical structure search with Solr. To that end I am creating a Solr plugin.
The structure search returns the structure_id and it's score. Scores are values between 100 and 0 (probably would never see a 0)
I use this to create a Solr query to pull all documents that have the structure_ids. I want the results of the search to be ordered by the structure search score, not the Solr relevancy.
I generate a query that looks like this:
+structure_id:(28760263^95 OR 30392284^82 OR 47390042^70)
The problem is that in my trivial test case Solr is returning the records matching the structure_id 28760263 last. It has assigned it the lowest relevancy (4.6609402E-6)!
I wrote a function to basically amplify the score by a lot and that apparently does fix the problem however I don't think that the amplification should be necessary.
I am using Solr 3.5.
Is there some configuration that I am missing? Currently I am using Solr pretty much out of the box. The only things I've changed is to add my plugin and I edited the example docs to add structure_ids for my test case.
Is there a way to completely override the lucene scoring with the score from the structure search? We have other reasons why we would like to take control of Solr's scoring and knowing how to do that would be useful

Different scores from Solr 1 vs Solr 4 Dismax Handler

I've migrated my Solr 1.4 index to Solr 4.0 using this method, and I've kept my solrconfig.xml and schema.xml as unchanged as possible while still being functional.
I'm using the DisjunctionMaxQuery (dismax / solr.DisMaxRequestHandler) requestHandler and comparing my search results between Solr 1.4 and Solr 4. Using ?debugQuery=on in the URL, I can see that the parsedQuery portion is virtually the same between Solr versions, yet the generated scores are different. (The explain portion is different, but the calculation is long and obtuse.)
Example query: q=foo
Example response:
Solr 1.4:
title: "foo (32-bit)"
score: 3.8850176
Solr 4.0:
title: "foo (32-bit)"
score: 2.1525226
Despite having the same request handler and identical indices, what would be causing this significant difference in scores?
If the explain portion is different, then it's using different calculations to calculate the scores so they are going to be different. Scores are pretty arbitrary anyways and are basically only used for comparison within the one result set for the query, in other words it doesn't make sense to compare scores from one query to the scores of another query. The same probably applies to different version of solr, especially if the way the calculations are done are different.

How to do a constant score query in Solr

I'm using SolrNet to access a Solr index where I have a multivalue field called "tags". I want to perform the following pseudo-code query:
(tags:stack)^10 OR (tags:over)^5 OR (tags:flow)^2
where the term "stack" is being boosted by 10, "over" is being boosted by 5 and "flow" is being boosted by 2. The result I'm after is that results with "stack" will appear higher than those with "flow", etc.
The problem I'm having is that say "flow" only appears in a couple of documents, but "stack" appears in loads, then due to a high idf value, documents with "flow" appear above those with "stack".
When this was project was implemented straight in Lucene, I used ConstantScoreQuery and these eliminated the idf based the score solely on the boost value.
How can this be achieved with Solr and SolrNet, where I'm effectivly just passing Solr a query string? If it can't, is there an alternative way I can approach this problem?
Thanks in advance!
Solr 5.1 and later has this built into the query parser syntax via the ^= operator.
So just take your original query:
(tags:stack)^10 OR (tags:over)^5 OR (tags:flow)^2
And replace the ^ with ^= to change from boosted to constant:
(tags:stack)^=10 OR (tags:over)^=5 OR (tags:flow)^=2
I don't think there any way to directly express a ConstantScoreQuery in Solr, but it seems that range and prefix queries use ConstantScoreQuery under the hood, so you could try faking a range query, e.g. tags:[flow TO flow]
Alternatively, you could implement your own Solr QueryParser.

Resources