Boost document in solr where document's field contains some value - solr

I have documets in solr in below format
{ "documents": [ {
"custom_string_New Arrival": "false",
"custom_string_Brand Name": "GB",
"custom_string_Product Name": "GB GB Girls Big Girls 7%2D16 Flutter%2DSleeve Jumpsuit",
"score": 11.223517,
"id": "67012"
},
{
"custom_string_New Arrival": "false",
"custom_string_Brand Name": "Lucy Paris",
"custom_string_Product Name": "Lucy Paris Knit Camille Sleeveless Belted Jumpsuit",
"score": 11.223517,
"id": "50097"
} ] }
I want boost a document whose custom_string_Product Name contains "Paris Knit"
I am creating a solr query with query parameter
bq=(custom_string_Product\ Name:(*Paris Knit*))^5000
I am expecting that the document with id= 50097 should come at the top, but i am not getting the expected result.
But if if do
bq=(custom_string_Product\ Name:(*Knit*))^5000
then I get the correct response.
the difference is only that in first query there is a space in between the search term.

When you're using wildcard queries (i.e. a * is present), most analysis is skipped (except those that are multitermaware, which are a few filters). In this case it simply doesn't work because there is no tokens matching Paris Knit - the tokens are probably stored as paris and knit (and not as one single token).
You can use either a string type field or a KeywordTokenizer for the field type - the KeywordTokenizer allows you to add a LowercaseFilter as well, so that your boosts becomes case insensitive.

Related

How can I rank results lower in SOLR if two fields match at the same time?

I have records with a "title" and a "brand" fields and i query both fields.
Sometimes a record has the brand in the title, which will result in higher scores, but I want to score them the same.
How can i rate records lower were both fields match?
Your solution is not ideal.
In Solr, there is the Dismax query parser that allows you to search for individual terms across several fields, using some other parameters to influence the final score.
The q parameter defines the main query while the qf parameter can be used to specify a list of fields with which to search.
In addition, the tie parameter lets you control how much the final score of the query will be influenced by the scores of the lower-scoring fields compared to the highest-scoring field.
Let's make a simple example.
Using the standard query parser this is what you will obtain running this query (q=adidas):
http://localhost:8983/solr/indexName/select?q=title:adidas%20OR%20brand:adidas&fl=id,title,brand,score
"docs": [
{
"id": "2",
"title": "Shoes Adidas",
"brand": "Adidas",
"score": 0.9623127
},
{
"id": "1",
"title": "Shoes",
"brand": "Adidas",
"score": 0.31506687
},
{
"id": "6",
"title": "Shirt",
"brand": "Adidas",
"score": 0.31506687
}
]
The doc with id 2 has a higher score than the others because the score is the sum of two clauses ('adidas' in title + 'adidas' in brand).
If you perform a Dismax query with tie=0 (a pure "disjunction max query"):
http://localhost:8983/solr/indexName/select?defType=dismax&q=adidas&qf=brand%20title&fl=id,title,brand,score&tie=0
You will obtain:
"docs": [
{
"id": "2",
"title": "Shoes Adidas",
"brand": "Adidas",
"score": 0.6472458
},
{
"id": "1",
"title": "Shoes",
"brand": "Adidas",
"score": 0.31506687
},
{
"id": "6",
"title": "Shirt",
"brand": "Adidas",
"score": 0.31506687
}
]
The doc with id 2 has a lower score than before because only the maximum scoring subquery contributes to the final score, i.e. it takes the max score between 0.6472458 and 0.31506687 without summing them (0.9623127).
With the qf parameter, it is also possible to assign a boost factor to increase or decrease the importance of a particular field in the query, for example:
&qf=brand^3 title
It makes matches in brand much more significant than matches in title.
In any case, boosting should be used with caution because it may lead to unexpected results. Every decision with boosting should be supported by an online and offline search relevance evaluation.
Can this help you?
I solved it by removing all occurrences of the brand in the title (and other fields) when writing the index.

Groupby/faceting by multiple fields in azure search

I want to groupby/faceting by multiple fields, say by "name" and "type" fields in the search index. Is it possible in Azure search. If so how can it be done?
It is not possible to facet by the combined values of multiple fields. You'd have to denormalize the fields yourself when you populate the index, then facet by the denormalized field. For example, if you have 'name' and 'type' fields, you'd have to create a combined 'nametype' field containing the combination of 'name' and 'type'. Then you would refer to the 'nametype' field in the 'facet' parameter of the Search request.
If before you had a document like this:
{ "id": "1", "name": "John", "type": "Customer" }
Now you will have a document like this:
{ "id": "1", "name": "John", "type": "Customer", "nametype": "John; Customer" }
(You can use whatever separator you like between the name part and type part of nametype.)
Now, when you search, include facet=nametype in the request, and you'll get a count of all combinations of 'name' and 'type' that exist in the index.

Solr proximity search return no result

The query I'm using is q=name:"william test bay"~2.
Schema.xml has the following:
<field name="name" type="text_en" indexed="true" stored="true"/>
The follow two is the response I want to return but in fact no result is returned:
"response": {
"numFound": 2,
"start": 0,
"docs": [
{
"id": "3",
"name": "william bay",
"_version_": 1561163645757423600
},
{
"id": "4",
"name": "william bay photography",
"_version_": 1561163645757423600
}
]
}
I'd like to know why. For example, I want to match "william test bay" to "william bay". I thought the edit distance is 1, just delete term "test"? In fact even name:"william test bay"~1000 doesn't work.
Currently using Solr 4.10.3. This is the one used by Cloudera Search so I couldn't upgrade it.
Proximity Searches
In Proximity all the phrase query keyword has to be present in document. In your case test keyword is not present. That's why you are not getting any result.
A proximity search looks for terms that are within a specific distance from one another.The distance referred to here is the number of term movements needed to match the specified phrase.But not to delete the keyword from the phrase query.
[Example]
text : william bay photography
query : william photography
To match above query distance has to be one. Because only one word has to be moved for search string to matched with the text.
q:"william photography"~1
You could try to use Query: q=name:(william test bay)

Bug in solr's isWithin operator for geo spatial filtering?

Consider following document in solr index.
{
"id": "580ddd10a5a18387fba12f1b5805374872931533336fa0ce",
"conceptid": "580ddd10a5a18387fba12f1b",
"storeid": "5805374872931533336fa0ce",
"startTime": "2016-10-21T15:36:08Z",
"endTime": "2016-10-24T15:36:08Z",
"update": "2016-10-24T15:36:08Z",
"conceptLocations": [
"POINT (-82.540283 40.15736)", (p1)
"POINT (-54.316406 -16.056371)" (p2)
"POLYGON (.....) " (pl1)
],
"concept_name": [
"event6"
],
"tags": [
"Tag 2g",
"Tag 2a"
],
"_version_": 1549065073902747600,
"score": 1
}
This document has two points stored in multivalued field conceptLocations so the points are stored in an array. Thus, if we do a query with filter which matches any one of these points then this document should be returned. Similarly, if we do a query with filter tags:"Tag 2a" then also this document should be returned i.e. for a multivalued field any one value should match for document to be selected. This is working fine with all the multivalued fields except the conceptLocations field in case of isWithin operator.
For conceptLocations field, no documents are returned if we do following kind of query.
conceptLocations: "IsWithin( Polygon around point p1 only )" // doesn't return document
Ideally, since one value of conceptLocations matches with filter, the document should be returned. Interestingly, it works fine with Intersects operator. The query,
conceptLocations: "Intersects( Polygon around point p1 only )" // returns document
returns the document as expected.
Also, following if we use geoWithin with a polygon which covers both p1 and p2 then the document is returned.
conceptLocations: " IsWithin( Polygon covering both p1 and p2 )" // returns document
I am not sure if this is a bug in solr but following standard rules of filtering the document should be returned even if only one value of a multivalued fields matches with the filter. Any idea why that is not the case here or am I missing something? Solr version is 5.0.0.

Solr facet substring search

Imagine I have the following facets:
Speakers: [Mike Thompson, Thomas Wilkinson, Sally Jeffers]
Venues: [Weill Thomas Medical Center, BB&R Medical Associates, KLR Thompson]
Solr seems to allow a &facet.prefix=Thom where I can get the facets that START with "Thom" and that will return "Speaker: Thomas Wilkinson" but no others.
How can I do the equivalent of &facet.substring=Thom which will return Mike Thompson and Weill Thomas....
I tried &facet.query=Thom but that doesnt seem to work at all.
Thanks
It is not possible to be sure as you did not provide your full query string, but it may be that the facet is not returning Weill Thomas in facet results because you are only specifying facet.field=speakers in your query, and Weill Thomas is actually in the venues field. You would require second facet.field=venues parameter in your search query to retrieve those.
Facet prefix is only used to filter results once the search is already done, so don't use that parameter for searching purposes. Check this question: SOLR facet search by prefix with results highlighting
Edit based on comment:
You don't necessarily need to filter results returned by faceting after the fact, just make sure that only the facets you want match the original query. The facets that were not part of the search query will have 0 occurances on them if you return all facets. You can then set facet.mincount=1 to only get facets that are found within the search results. Here's an example that I mocked up with test data:
q=*Thom*&rows=0&df=speakers&wt=json&indent=true&facet=true&facet.field=speakers&facet.field=venues&facet.mincount=1&json.nl=map
And the response from Solr:
"responseHeader": {
"status": 0,
"QTime": 3,
"params": {
"q": "*Thom*",
"df": "speakers",
"facet.field": [
"speakers",
"venues"
],
"json.nl": "map",
"indent": "true",
"facet.mincount": "1",
"rows": "0",
"wt": "json",
"facet": "true",
"_": "1431772681445"
}
},
"response": {
"numFound": 2,
"start": 0,
"docs": []
},
"facet_counts": {
"facet_queries": {},
"facet_fields": {
"speakers": {
"Mark Thomas": 1,
"Thomas Moore": 1
},
"venues": {
"Weill Thomas": 1
}
},
"facet_dates": {},
"facet_ranges": {},
"facet_intervals": {},
"facet_heatmaps": {}
}
Just wanted to point out a caveat of the proposed solution (i.e. which is to basically just do your facet substring query as the main Solr query, and then the facet values will be what you want). This won't work correctly for multi-valued fields. For example, if a document had 3 values for speaker of "Mark Thomas", "Fred Jones", "John Doe", then the query 'q=*Thom*' would return as facets "Fred Jones" and "John Doe", in addition to "Mark Thomas", and this would not be the desired result (i.e. "Fred Jones" and "John Doe" should not be returned). So for single-valued fields this solution could work, but for multi-valued fields you would probably have to write an intermediary web service that would filter out the non-matches (like "Fred Jones" and "John Doe"). Solr should really add a facet.substring parameter that would work like the facet.prefix parameter, but do substring filtering on the facet values instead of prefix filtering.

Resources