Partial term search for a field with "/" special character in azure cognitive search

Partial term search for a field with "/" special character in azure cognitive search - azure-cognitive-search

I am seeing some issues when trying to search for a field that has “/” special character in it in Azure Search.
I am listing the queries I have tried below, so looking for help in figuring out the issue.
There is a field called CustomProperty with keyword analyzer, so the field doesn’t get tokenized on special characters.
CustomPropertyWithKeywordAnalyzer
The value of customProperty is objectId/70efb434-40c4-4314-a53c-179700480ca8
search=/.*object.*70ef.*/&queryType=full&searchFields=customProperty
query works
QueryResult
search=/.*70efb434-40c4-4314-a53c-179700480ca8.*/&queryType=full&searchFields=customProperty
query works
QueryResult
Starts to get near to “/” the query doesn’t work, while “object.*” query works above.
search=/.*objectI.*70ef.*/&queryType=full&searchFields=customProperty
doesn’t work
QueryResult
search=/.*objectI.*/&queryType=full&searchFields=customProperty
doesn’t work
QueryResult
search=/.*objectId\/70efb4.*/&queryType=full&searchFields=customProperty
https://learn.microsoft.com/en-us/azure/search/search-query-partial-matching#about-partial-term-search says to escape / with , but that also doesn’t work.
doesn’t work
QueryResult
simple(not full-lucene) also doesn't work
search=objectId\/70efb4*&searchFields=customProperty
QueryResult

The issue you're running into isn't caused by the / but instead is caused by a mismatch in the casing between your query and the data in your index. Partial term searches, like the regex searches that you're doing, are automatically lowercased. However, they keyword analyzer you're using doesn't lowercase the text. That means that in your index, the data looks like objectId but your query is being automatically lowercased to objectid. That mismatch is what's causing your query to not return results. There are more details here: https://learn.microsoft.com/azure/search/search-query-partial-matching
For your use case, I'd recommend the following analyzer:
"analyzers": [
{
"#odata.type": "#Microsoft.Azure.Search.CustomAnalyzer",
"name": "custom-keyword",
"tokenizer": "keyword_v2",
"tokenFilters": [
"lowercase"
],
"charFilters": []
}
]
I tested your queries with that analyzer and that sample data and they all worked using that analyzer.

Related

Solr query with white spaces and with query and operation

I am working on a search solution in my project and it all working fine except one scenario. The search query operation is AND (q.op=AND), when I search for something ex: 40486 52P.57 it is not giving any results (here the query is being prepared in the java), but when I search the same in solr admin panel it gives the correct results. In my java code I am escaping the search query so the query being passed to solr like q : 40486\\ 52P.57, but when I execute it in solr admin it was like q : 40486 52P.57.
Note: the two words in the above search query belong to two different fields.
Another thing I have noticed was that if the words in the search query belong to the same field then results are coming fine without any issue, for ex: 40486 67 where both the words belong to same field and the query from my java code was q: 40486\\ 67 and in solr admin it was q : 40486 67 but in both cases it works fine.
I could not see any problem here, can someone please help me on this?
Update
I found the root cause why it is not working. The issue is that with the escaping the space. Actually I am using the individual fields in qf for searching for 100% match, I mean mm=100 in this case. So escaping the space will be making the query as q : 40486\\ 52P.57 and giving no results, but if I use multifield with all the searchable fields in it then it is giving the results even when the query is q : 40486\\ 52P.57. Is it a limitation with edismax in solr? can someone please help me how to fix this without creating the multifield? My expectation is that it should work even after escaping the space using the individual fields in qf parameter.
Example index:
{
productNumber : 40486754,
productShot : 52P.57 UTM,
description : something,
general_search {
40486754,
52P.57 UTM,
something
}
},
{
productNumber : 12345,
productShot : 52P.57 ABC,
description : xzy,
general_search {
12345,
52P.57 ABC,
xzy
}
}
Example queries:
Query 1:
qt=/select&q.op=AND&defType=edismax&q=40486\+52P.57&qf=productNumber+productShot+description
Query 2:
qt=/select&q.op=AND&defType=edismax&q=40486 52P.57&qf=productNumber+productShot+description
Query 3:
qt=/select&q.op=AND&defType=edismax&q=40486\+52P.57&qf=general_search
in the above queries Query 2 and 3 are working but not Query 1

The issue was with escaping at whitespace. When I escape the whitespace it was treated like single word "40486 52P.57" and when space was not escaped it was treated like two different words 40486 and 52P.57 so it works fine.

Is df a mandatory param while using Re-rank query parser in solr

I am using Re-rank query parser to re-rank documents from solr.
I am able to get the results of re-ranked query, when df param in passed in the lucene query
http://ip:port/solr/core/select?qt=dismax&q=mobile&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=(red)&fl=display_query&df=query
Where as if we dont pass df param, the query is not working :
http://ip:port/solr/core/select?qt=dismax&q=mobile&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=(red)&fl=display_query&df=query
Error thrown :
"error": {
"metadata": [
"error-class",
"org.apache.solr.common.SolrException",
"root-error-class",
"org.apache.solr.search.SyntaxError"
],
"msg": "org.apache.solr.search.SyntaxError: Neither qf nor df are present.",
"code": 400
}
Not able to figure out the relation between rq(re-rank query) & df, and why will it effect the way re-ranking works.

Got the reason behind this and how this can be solved, hope if helps someone else looking out for something similar.
When using reRankQuery, lucene's default query parser is used hence it does not know about df (even using qf paramter here wouldn't work here since it is understood only by dismax query parser). For any query when no parser is specified, it uses lucene's default query parser.
to force reRank to parse the query using dismax the syntax can be
http://ip:port/solr/core/select?qt=dismax&q=mobile&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq={!dismax}(red)&fl=display_query&df=query
Do check the code to understand better

Azure Search highlighting doesn't work for wildcards with scoring profiles

Azure Search supports highlighting with full text search which facilitates clients to locate the matched term in a returned document. I have provided a simple index schema below to illustrate the issue.
{
"name": "simple-index",
"fields": [
{
"name": "key",
"type": "Edm.String"
},
{
"name": "simplefield",
"type": "Edm.String"
}
],
"scoringProfiles": [
{
"name": "boostedprofile",
"functionAggregation": null,
"text": {
"weights": {
"simplefield": 5,
}
},
"functions": []
}
],
"corsOptions": null,
"suggesters": [],
"analyzers": [],
"tokenizers": [],
"tokenFilters": [],
"charFilters": []
}
For a normal search query like below, it works as expected and gives back the expected result.
search=foobar&highlight=simplefield
On extending the above query to use a wildcard query, things are again as expected with the response containing highlights on the terms matching the prefix. So far so good.
search=foo*&highlight=simplefield&querytype=full
After this when I apply a scoring profile on top of the previous query, the results are unexpected and no highlights are returned.
search=foo*&highlight=simplefield&querytype=full&scoringprofile=boostedprofile
How do I make highlights work for the wildcard queries when using a scoring profiles?

At the time of answering, this is a known limitation in Azure Search where highlighting doesn't work for wildcard queries when used with scoring profiles. Internally Azure Search uses a concept of highlighter which is responsible for the highlighting flow as a separate process that happens after search.
In the case of wildcard query, it involves looking up all terms in the index that match the provided prefix term and then use them to compose the highlighted text. Scoring profiles affect the way terms are looked up in index for highlighting. Due to that the result doesn't include any highlights.
As this is a specific limitation in wildcard queries, one workaround is to pre-process the index to avoid issuing wildcard/prefix queries. Please take a look at custom analysis (https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search) You can, for example, use edgeNgram tokenfilter and store prefixes of words in the index and issue a regular term query with the prefix (with out the '*' operator)
I hope this is useful. Please vote on the feedback item to help us prioritize our development efforts to support other modes of highlighting that will support the above use-case. https://feedback.azure.com/forums/263029-azure-search/suggestions/32661961-implement-other-highlighters

Azure Search Highlight Partial Match

I have turned Hit Highlighting on and it is working well for entire word matches. But we append a wildcard character at the end of each word the user specifies and highlighting is not working on the partial matches. We are getting the results back, but the .Highlights object is null so no highlighting is available for partial matching.
Here is how we configure the SearchParameters:
var parameters = new SearchParameters
{
Filter = newFilter,
QueryType = QueryType.Full,
Top = recordsPerPage,
Skip = skip,
SearchMode = SearchMode.Any,
IncludeTotalResultCount = true,
HighlightFields = new List<string> { "RESULT" },
HighlightPreTag = "<font style=\"color:blue; background-color:yellow;\">",
HighlightPostTag = "</font>"
};
return parameters;
response = indexClient.Documents.Search<SearchResultReturn>(query, parameters);
Here is an example of our query string: ("the") the*^99.95
The idea is we search for the exact string the user specified (multiple words) and then we do a wild-card search for each individual word specified.
So for the above example we are getting all the results that contain "the" and "the*" but only the words "the" have the highlighting. "They", "There", etc do not have any highlighting even if "They" is the only matching entry in the result ("the" was not in the result).
Again the query is bringing back the correct results, it's just the highlighting is not working for partial matches.
Is there some other setting I need to be able to highlight partial matches?

Thanks for reporting the issue.
Unfortunately, it is a known limitation in Azure Search that matches are sometimes not highlighted for broad wildcard search. Highlighting is an independent process after search. Once matching documents are retrieved, the highlighter looks up the search index for all terms that match the wildcard criteria, and use the terms in highlighting the retrieved documents. For broad wildcard search queries, like a* (or the*), the highlighter only uses the top N most significant terms based on their frequencies in the corpus for performance reasons. In your example, 'they' and 'there' are not included in the highlights probably because their appearances in most documents.
As this is a limitation in wildcard queries, one workaround is to preprocess the index to avoid issuing wildcard/prefix queries. Please take a look at custom analysis (https://learn.microsoft.com/en-us/rest/api/searchservice/custom-analyzers-in-azure-search) You can, for example, use edgeNgram tokenfilter and store prefixes of words in the index and issue a regular term query with the prefix (with out the '*' operator)
Hope this helps. Please let me know if you have any further questions.
Nate

Thanks for the reply, but it doesn't seem to be the issue, it seems to be an issue with the Boosting function I have on the search.
When I removed the boosting function then partial highlighighting worked as expected. When I added the boosting function back in partial highlighting stopped working. Can you verify that is a bug?
Here is my boosting function:
"scoringProfiles":[{"name":"PreRiskBoost",
"text":null,"functions":
[{"fieldName":"PreRiskCount",
"freshness":null,
"interpolation":"linear",
"magnitude":{"boostingRangeStart":1,
"boostingRangeEnd":99,
"constantBoostBeyondRange":true},
"distance":null,
"tag":null,
"type":"magnitude","boost":10}],
"functionAggregation":"sum"}],
"defaultScoringProfile":"PreRiskBoost"
Do you know why having the Boosting function prevents partial highlighting from working?

Queries with stopwords and searchMode=all return no results

if I have a document with this words in the content:
"dolor de cabeza" using the spanish analyzer, searching for "dolor de cabeza" returns the document ok. but using dolor de cabeza (without quotes) returns nothing.
Actually, every stop word in the search query will make it to return no documents when using queryType=Full and searchMode=All.
the problem with using the quote approach is that it will only match the exact sentence.
is there any workaround? I think this is a BUG.

Short version:
This happens when you issue a search query with searchMode=All against fields that use analyzers that process stopwords differently. Please make sure you scope your query only to fields analyzed with the same analyzer using the searchFields search request parameter. Alternatively, you can set the same searchAnalyzer on all your searchable fields that removes stopwords from your query in the same way. To learn more about custom analyzers and how to search indexAnalyzer and searchAnalyzer independently, go here.
Long version:
Let’s take an index with two fields where one is analyzed with English Lucene analyzer, and the other with standard (default) analyzer.
{
"fields":[
{
"name":"docId",
"type":"Edm.String",
"key":true,
"searchable":false
},
{
"name":"field1",
"type":"Edm.String",
"analyzer":"en.lucene"
},
{
"name":"field2",
"type":"Edm.String"
}
]
}
Let’s add these two documents:
{
"value":[
{
"docId":"1",
"field1":"Waiting for a bus",
"field2":"Exploring cosmos"
},
{
"docId":"2",
"field1":"Run to the hills",
"field2":"run for your life"
}
]
}
The following query doesn’t return any results search=wait+for&searchMode=all
It's because terms in this query are processed independently for each of the fields in the index by the analyzer defined for that field.
For field1 the query becomes search=wait (‘for’ was removed as it is a stop word)
For field2 it stays search=wait+for (the standard analyzer doesn’t remove stop words).
Only the first document matches ‘wait’ (in the first field), however the second field in the first document doesn’t match ‘for’, thus no results. When you set searchMode=all you tell the search engine that all query terms must be matched at least once.
For comparison, another query with a stopword search=running+for&searchMode=all returns the second document as a result. Term ‘running’ matches in field1 (it’s stemmed) and ‘for’ matches in field2.
To learn more about query processing in Azure Search read How full text search works in Azure Search