How to increase score threshold in solr

How to increase score threshold in solr - solr

I'm running the following select query to find a restaurant in a certain area using Solr:
{
"responseHeader":{
"status":0,
"QTime":0,
"params":{
"q":"name:\"Sushi Hiro\"",
"pt":"51.048688,-114.0778858",
"d":"0.2",
"fl":"*,score",
"fq":"{!geofilt sfield=location}",
"rows":"10000000",
"wt":"json",
"debugQuery":"true"}},
"response":{"numFound":1,"start":0,"maxScore":11.842687,"docs":[
{ .... }
However, Solr only returns the most similar document and it doesn't show me the rest. So what I want is to get at least 2 more documents which are also similar to my query. How can I modify the score threshold to get more results ?

There is no such thing as a "score threshold" - the documents returned are those that match your query. Those that haven't been included does not match the terms you've given in your query, in which you way what the requirements to be included in the query is.
In your example I guess the issue is that you're asking for documents located within 200m of the position given (d=0.2), and there is only one document within range that can be included.
If you want to sort (or boost) by the distance instead of limiting the results to those that are close by, take a look at spatial search and geodist.

Related

How to query Solr to get the documents if it matches 50% of the query string?

I am using Solr 7.6 with the document structure is as follows:
{
"source_ln":"en",
"source_text":"the sky is blue",
"target_ln":"hi",
"target_text":"आसमान नीला है",
},
{
"source_ln":"en",
"source_text":"the sky is also called the celestial sphere",
"target_ln":"hi",
"target_text":"आकाश को आकाशीय क्षेत्र भी कहा जाता है",
}
All the fields are defined with the StandardTokenizerFactory tokenizer.
When I query "source_text":"the sky",
The result set should contain the first document only.
In the second document the field "source_text":"the sky is also called the celestial sphere" contains 8 terms and the query field "source_text":"the sky" contains the 2 terms only, So the at least 50% match criteria is not fulfilled and hence 2nd document would not be in the result set.
Is there any way to get the documents matching at least 50% of the query field terms/tokens?
Thanks in advance.

You can set your request handler to use a (e)dismax query parser, for example using the defTypeparameter eg. ?q=...&defType=dismax.
Using a dismax parser, you can then use the mm (Minimum Should Match) parameter according to your needs, just by setting mm=50%.

You can achieve the features by doing below steps.
Create separate field in your schema name "source_text_fifty",
param(indexing=true, storing=false, and don't apply
StandardTokenizerFactory grammar type or better create separate
datatype field with solr.KeywordTokenizerFactory ).
Now, Calculate 50% of your input during Indexing the doc and store
those calculated data in "source_text_fifty" field.
Re-index all exiting data with above logic.
Run query with source_text_fifty:"the sky". Now you got only one 50% match data.

stats.field and stats.facet doesn't seem to be working right

I'm storing a series of records that contain a section_url and a count fields.
I'm trying to aggregate by section_url and sum the count field, so I'm querying with the following params:
"params":{
"indent":"true",
"stats.field":"count",
"stats":"true",
"q":"section_url:tv AND domain:[* TO *] AND date:\"2014-12-22T00:00:00Z\"",
"stats.facet":"section_url",
"wt":"json"}}
As you can see, I'm running stats on the count field and stats faceting on section_url.
Most of the times this is working fine, but for some reason it's being buggy for some fields. For example, this result My result
As you can see, all the section_url's are http://www.cb10.tv/ . However, for some reason my stats faceting seems to think section_url is actually two fields, www.cb10 and tv instead of http://www.cb10.tv/
Any idea of what could be the problem?

Seems like section_url is tokenized as you get several tokens for each entry. Faceting is performed on the indexed token, which means that you end up with a count for each token in the indexed content - and not for the content of the field itself.
Add a StrField (or a TextField with a KeywordTokenizer) and do a copyField to populate it (or change the existing field), and reindex your content. Use that field for generating the facet counts instead.

How do I create a Solr query that returns results even if one field in my query has no matches?

Suppose I want to create a recommendation system to suggest people you should connect with based off of certain attributes that I know about you and attributes I have about other people that are stored in a Solr index. Is it possible to query the index with a list of attributes (along with boosts for each attribute) and have Solr return scored results even if some of my fields return no matches? The way that I understand that Solr works is that if one of your fields doesn't contain a match in any documents found in your index, you get zero results for the entire query (even if other fields in the query matched) - is that right? What I would hope is that I could query the index and get a list of results back in order of a score given based on how many (and which) fields matched to something, even if some fields have no matches, for example:
Say that there are 2 people documents stored in the index as follows (figuratively):
Person 1:
Industry: Manufacturing
City: Oakland
Person 2:
Industry: Manufacturing
City: San Jose
And say that I perform a pseudo-Solr query that basically says "Search for everyone whose industry is equal to manufacturing and whose city is equal to Oakland". What I would like is to receive both results back in the result set, even though one of the "Persons" does not reside in Oakland. I just want that person to come back as a result with a lower score than Person1. Is this possible? What might a solr query look like to handle this? Assume that I have many more than 2 attributes for each person (so saying that I can use "And" and "Or" in my solr query isn't really feasible.. or is it?) Thanks in advance for your helpful input! (PS I'm using Solr 3.6)

You mention using the AND operator, which is likely your problem.
The default behavior of Lucene, and Solr, query syntax is exactly what you are asking for. A query like:
industry:manufacturing city:oakland
Will match either, with scoring preference on those that match both. See the lucene query syntax documentation

You can use the bq parameter (boost query) does not affect matching, but affects the scores only.
http://localhost:8983/solr/persons/select?q=industry:manufacturing&bq=City:Oakland^2
play with the boosting factor at the end to get the correct balance between matching score, and boosting score.

Solr - How do I sort by geospatial distance and return the distance?

Doing a Bbox search with only location is returning accurate data; but if we add more search parameters, the returned distance score gets wrong-
For e.g-
case 1:
http://devtsg.truckertools.com/solr-4.4.0/collection1/select?wt=json&rows=1&fl=*,score&sort=score asc&q={!bbox score=distance sfield=geo pt=33.3232,-83.383 d=150}
-it returns correct distance for the store- "score":0.02656421
case 2:
But if I add another checking, with Bbox, it returns wrong distance-score
http://devtsg.truckertools.com/solr-4.4.0/collection1/select?wt=json&rows=1&fl=*,score&sort=score asc&q=({!bbox score=distance sfield=geo pt=33.3232,-83.383 d=150} AND :)
-the above one returns "score":0.7258905 , which is wrong. It should be same as the above one.
case 3:
Just to make sure, have added a checking with the id of the store-
http://devtsg.truckertools.com/solr-4.4.0/collection1/select?wt=json&rows=1&fl=*,score&sort=score asc&q=({!bbox score=distance sfield=geo pt=33.3232,-83.383 d=150} AND id:9220)
-now this one also returns wrong distance- "score":9.05333
Am not getting whats going wrong here.
Thanks in advance.

Put each 'AND'ed part of your query into Solr filter queries ('fq' param), and leave 'q' for keyword search relevancy. In your field list ('fl' param) you can put a function query to return the distance: fl=*,dist:geodist(). Other params like 'pt' and 'sfield' are required. To sort, use sort=geodist() asc.
However, you can't use the geodist() function query with a spatial "RPT" field in versions of Solr prior to v4.5. I see you are using 4.4. If you need to sort on an RPT field (only needed if you have multiple locations) in Solr 4.2 thru 4.4 then you have to approach this differently, and your attempt is close. I suggest always using 'q', and 'fq' params as you normally should use them (keyword and filters, respectively). Consider this echoParams output of my query to Solr:
"indent":"true",
"wt":"json",
"sort":"query({!bbox v='' filter=false score=distance}) asc",
"fl":"*,score,dist:query({!bbox v='' filter=false score=distance})",
"sfield":"geo",
"pt":"33.3232,-83.383",
"d":"150",
"q":"*:*",
"fq":"{!bbox}",
"fq":"id:9220"
Yeah, it's ugly. Again, as of Solr 4.5 you no longer have to resort to this.
By the way, the behavior you see is actually not a bug. You need to compose your query differently to get the results you want.

Solr Faceting on Multiple Concatenated Fields

I need a way to get facets on two combined field names. To show you what I mean, take a look at the query as it is now:
{
"responseHeader":{
"status":0,
"QTime":16,
"params":{
"facet":"true",
"indent":"true",
"q":"productId:(1 OR 2 OR 3 OR 4)",
"facet.field":["productMetaType",
"productId"],
"rows":"10"}},
"response":{"numFound":4,"start":0,"docs":[
{
"productId":1,
"productMetaType":"PRIMARY_PHOTO",
"url":"1_PRIM.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_1.JPG"},
{
"productId":1,
"productMetaType":"OTHER_PHOTO",
"url":"1_2.JPG"},
{
"productId":2,
"productMetaType":"OTHER_PHOTO",
"url":"2_1.JPG"}]
},
"facet_counts":{
"facet_queries":{},
"facet_fields":{
"productMetaType":[
"PRIMARY_PHOTO",1,
"OTHER_PHOTO",3],
"productId":[
"1",3,
"2",1]},
"facet_dates":{},
"facet_ranges":{}
}
}
I get two facet fields, productMetaType and productId. What I need to do is somehow combine those fields so I get data back something like this:
1_PRIMARY_PHOTO, 1,
1_OTHER_PHOTO, 2,
2_PRIMARY_PHOTO, 0,
2_OTHER_PHOTO, 1
Does the pivot functionality do this? Unfortunately, we're running Solr 3.1, so pivot isn't available, but if that is the only way to do this, I might have some ammo for upgrading.
The only other thing I could think of was some how concatenating the field names. I am new to Solr and don't know what is possible. Any advice or assistance is appreciated. Thank you for your time.

Yes, Pivot would work do the trick, but as you observed, this feature is only available in Solr trunk.
Your idea to combine both fields would work too. Actually, if your fields have a limited number of values, the easiest and most flexible way to do this would be to use facet queries:
productId:1 AND productMetaType:PRIMARY_PHOTO
productId:2 AND productMetaType:OTHER_PHOTO
productId:1 AND productMetaType:OTHER_PHOTO
productId:2 AND productMetaType:PRIMARY_PHOTO
Otherwise, just create a new field in your Solr schema.xml with string type, recreate your index by adding your documents as previously, but with this new field (that you can generate as you wish, using '_' as a separator between the two field values would work perfectly).

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to increase score threshold in solr - solr

Related

How to query Solr to get the documents if it matches 50% of the query string?

stats.field and stats.facet doesn't seem to be working right

How do I create a Solr query that returns results even if one field in my query has no matches?

Solr - How do I sort by geospatial distance and return the distance?

Solr Faceting on Multiple Concatenated Fields

Categories

Resources