How can I rank exact matches higher in azure search - azure-cognitive-search

I have an index in azure search that consists of person data like firstname and lastname.
When I search for 3 letter lastnames with a query like
rau&searchFields=LastName
/indexes/customers-index/docs?api-version=2016-09-01&search=rau&searchFields=LastName
The name rau is found but it is quite far at the end.
{
"#odata.context": "myurl/indexes('customers-index')/$metadata#docs(ID,FirstName,LastName)",
"value": [
{
"#search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Liebetrau"
},
{
"#search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Damerau"
},
{
"#search.score": 8.729204,
"ID": "someid",
"FirstName": "xxx",
"LastName": "Rau"
More to the top are names like "Liebetrau","Damerau".
Is there a way to have exact matches at the top?
EDIT
Querying the index definition using the RestApi
GET https://myproduct.search.windows.net/indexes('customers-index')?api-version=2015-02-28-Preview
returned for LastName
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null,
"synonymMaps": []
Edit 1
The analyzer definition
"scoringProfiles": [],
"defaultScoringProfile": null,
"corsOptions": null,
"suggesters": [],
"analyzers": [
{
"name": "prefix",
"tokenizer": "standard",
"tokenFilters": [
"lowercase",
"my_edgeNGram"
],
"charFilters": []
}
],
"tokenizers": [],
"tokenFilters": [
{
"name": "my_edgeNGram",
"minGram": 2,
"maxGram": 20,
"side": "back"
}
],
"charFilters": []
Edit 2
At the end specifying a ScoringProfile that i use whene querying did the trick
{
"name": "person-index",
"fields": [
{
"name": "ID",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null
}
,
{
"name": "LastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"analyzer": "my_standard"
},
{
"name": "PartialLastName",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": true,
"facetable": true,
"key": false,
"indexAnalyzer": "prefix",
"searchAnalyzer": "standard",
"analyzer": null
}
],
"analyzers":[
{
"name":"my_standard",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding" ]
},
{
"name":"prefix",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "my_edgeNGram" ]
}
],
"tokenFilters":[
{
"name":"my_edgeNGram",
"#odata.type":"#Microsoft.Azure.Search.EdgeNGramTokenFilterV2",
"minGram":2,
"maxGram":20,
"side": "back"
}
],
"scoringProfiles":[
{
"name":"exactFirst",
"text":{
"weights":{ "LastName":2, "PartialLastName":1 }
}
}
]
}

The analyzer "prefix" set on the LastName field produces the following terms for the name Liebetrau : au, rau, trau, etrau, betrau, ebetrau, iebetrau, libetrau. These are edge ngrams of length ranging from 2 to 20 starting from the back of the word, as defined in the my_edgeNGram token filter in your index definition. The analyzer will process other names in the same way.
When you search for the name rau, it matches all names as they all end with those characters. That's why all documents in your result set have the same relevance score.
You can test your analyzer configurations using the Analyze API.
To learn more about custom analyzers please go here and here.
Hope that helps

Related

Scoring profile with weights and a function

I'm using Azure Search with a scoring profile. I need text fields along with quantity sold to be a part of the scoring profile. I can configure the following profile, but the quantity sold doesn't seem to be factored in to the search score when I query the index. I'm thinking because quantity sold isn't a string, its an int. Therefore, I can't make the field searchable? I'm using the new featuresMode parameter in the query, the quantity sold field doesn't even appear in the scoring breakdown
"scoringProfiles": [
{
"name": "Product Name",
"functions": [
{
"fieldName": "QuantitySold",
"freshness": null,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 0,
"boostingRangeEnd": 100000,
"constantBoostBeyondRange": true
},
"distance": null,
"tag": null,
"type": "magnitude",
"boost": 6
}
],
"functionAggregation": "sum",
"text": {
"weights": {
"ProductName": 4,
"ProductSet": 3,
"ProductDesc": 2
}
}
}
],
What type of boost you should use depends on the datatype. An int like QuantitySold should use type magnitude for boosting. A date would use type freshness and so on.
I recreated a minimal example with the simplest possible index with only two properties: Id and Title.
CREATE INDEX
{
"#odata.context": "https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/$metadata#indexes/$entity",
"#odata.etag": "\"0x8D8761DCBBCCD00\"",
"name": "{{INDEX}}",
"defaultScoringProfile": null,
"fields": [
{
"name": "Id",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"key": true,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "Title",
"type": "Edm.String",
"facetable": false,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": true,
"analyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "QuantitySold",
"type": "Edm.Int32",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": false,
"sortable": true,
"analyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
], "scoringProfiles": [
{
"name": "relevance",
"text": {
"weights": {
"Title": 1.5
}
}
},
{
"name": "sales",
"functions": [
{
"type": "magnitude",
"fieldName": "QuantitySold",
"boost": 100,
"interpolation": "linear",
"magnitude": {
"boostingRangeStart": 1,
"boostingRangeEnd": 100000,
"constantBoostBeyondRange": false
}
}
]
} ],"corsOptions": null, "suggesters": [], "analyzers": [], "tokenizers": ], "tokenFilters": [], "charFilters": [], "encryptionKey": null}
UPLOAD MINIMAL
I then submit two products. One called Apple iPhone with a low sales quantity. And Apple Juice with a high sales quantity.
{
"value": [
{
"#search.action": "mergeOrUpload",
"Id": "2",
"Title": "Apple Juice",
"QuantitySold": 10000
},
{
"#search.action": "mergeOrUpload",
"Id": "1",
"Title": "Apple iPhone",
"QuantitySold": 35
}
]
}
QUERY
Without using any scoring profile, I query for apple. As expected, the two items are equally relevant responses to my query. They both match one of two tokens. Both get score 0.25811607
https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX}}/docs?search=apple&$count=true&searchMode=all&queryType=full&api-version={{API-VERSION}}&featuresMode=enabled
{
"#odata.count": 2,
"value": [
{
"#search.score": 0.25811607,
"#search.features": {
"Title": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.25811607,
"termFrequency": 1.0
}
},
"Id": "2",
"Title": "Apple Juice",
"QuantitySold": 10000
},
{
"#search.score": 0.25811607,
"#search.features": {
"Title": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.25811607,
"termFrequency": 1.0
}
},
"Id": "1",
"Title": "Apple iPhone",
"QuantitySold": 35
}
]
}
QUERY WITH BOOST ON QUANTITY SOLD
I then repeat the query for apple, but this time I boost items with a high QuantitySold by selecting my scoring profile called sales. This boosts our Apple Juice item to the top with a score of 2.8123235. The Apple iPhone item has also received a boost, but much smaller with only a score of 0.26680434.
https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX}}/docs?search=apple&$count=true&searchMode=all&queryType=full&api-version={{API-VERSION}}&featuresMode=enabled&scoringProfile=sales
{
"#odata.count": 2,
"value": [
{
"#search.score": 2.813235,
"#search.features": {
"Title": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.25811607,
"termFrequency": 1.0
}
},
"Id": "2",
"Title": "Apple Juice",
"QuantitySold": 10000
},
{
"#search.score": 0.26680434,
"#search.features": {
"Title": {
"uniqueTokenMatches": 1.0,
"similarityScore": 0.25811607,
"termFrequency": 1.0
}
},
"Id": "1",
"Title": "Apple iPhone",
"QuantitySold": 35
}
]
}

Filter inside nested ComplexTypes

I'm trying to filter based upon the value of tagdata/tags/tag. Any ideas for me? Basically I just want to select documents where the text of the tag matches a specific string. Thanks!
The filter:
search=*&$filter=tagdata/tags/any(tag: tagdata/tags/tag eq 'text1')
The error:
Invalid expression: The parent value for a property access of a property 'tag' is not a single value. Property access can only be applied to a single value.
I've got a ComplexType definition that looks like this:
{
"name": "tagdata",
"type": "Edm.ComplexType",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "tags",
"type": "Collection(Edm.ComplexType)",
"analyzer": null,
"synonymMaps": [],
"fields": [
{
"name": "tagid",
"type": "Edm.Int64",
"facetable": false,
"filterable": true,
"retrievable": true,
"sortable": false,
"analyzer": null,
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
},
{
"name": "tag",
"type": "Edm.String",
"facetable": true,
"filterable": true,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "en.lucene",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [],
"fields": []
}
]
}
]
},
The data looks like this:
{
"tags": [
{
"tagid": 83,
"tag": "text1"
},
{
"tagid": 29,
"tag": "text2"
},
{
"tagid": 69,
"tag": "text3"
},
{
"tagid": 115,
"tag": "text4"
}
]
}
This should work:
search=*&$filter=tagdata/tags/any(t: t/tag eq 'text1')
Think of the any lambda expression as a loop over the tags collection, where the identifier behind the colon is the loop variable. That variable is of complex type, so you can access its properties using a slash.

Azure Search Normalized Lowercase Field

I am unable to add a normalized copy of the "Title" field to our search index. Ultimately, I'm trying to use this field for case-insensitive order by. Currently, titles are returned in the following order (with $orderBy=TitleCaseInsensitive):
Abc
Bbc
abc
And instead I want: Abc->abc->Bbc. I have forked the "Title" field out into two fields via a Field Mapping and am then applying a Custom Analyzer with the "lowercase" tokenFilter, to the normalized field. Can someone explain why I am not getting the desired results? Here is the relevant portion of the index definition:
"index":{
"name": "current-local-inventory",
"fields": [
{"name": "TitleCaseInsensitive","indexAnalyzer":"caseInsensitiveAnalyzer","searchAnalyzer":"keyword", "type": "Edm.String","filterable": false, "sortable": true, "facetable": false, "searchable": true},
{"name": "Title", "type": "Edm.String","filterable": true, "sortable": true, "facetable": false, "searchable": true},
],
"analyzers": [
{
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"name":"caseInsensitiveAnalyzer",
"charFilters":[],
"tokenizer":"keyword_v2",
"tokenFilters":["lowercase"]
}
]
},
"indexers":[{
"fieldMappings" : [
{"sourceFieldName" : "Title", "targetFieldName" : "Title" },
{"sourceFieldName" : "Title", "targetFieldName" : "TitleCaseInsensitive" }
]
}]
See my answer in the related post Azure Search - Accent insensitive analyzer not working when sorting. When you include the lowercase token filter it only affects search and not sorting. See Azure Search User Voice entry Case-insensitive sorting for string fields
My suggested workaround as I explain in the related post is to create a forked/shadow property. However, using an analyzer with a lowercase token filter won't help. The only way I could get your example working was to include a copy of your Title property that was already lowercased. Notice that I don't use fieldMapping and I don't use different analyzers for indexing and search like you have in your example.
CREATE INDEX
Create the index. Replace variables wrapped in angle brackets as suitable for your env.
{
"#odata.context": "https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/$metadata#indexes/$entity",
"#odata.etag": "\"0x8D8761DCBBCCD00\"",
"name": "{{INDEX_NAME}}",
"defaultScoringProfile": null,
"fields": [
{"name": "Id", "type": "Edm.String", "searchable": false, "filterable": true, "retrievable": true, "sortable": true, "facetable": false, "key": true, "indexAnalyzer": null, "searchAnalyzer": null, "analyzer": null, "synonymMaps": [] },
{"name": "TitleCaseInsensitive","indexAnalyzer": null, "searchAnalyzer": null, "analyzer": "caseInsensitiveAnalyzer", "type": "Edm.String","filterable": false, "sortable": true, "facetable": false, "searchable": true},
{"name": "Title", "type": "Edm.String","filterable": true, "sortable": true, "facetable": false, "searchable": true}
],
"scoringProfiles": [],
"corsOptions": null,
"suggesters": [],
"analyzers": [ {
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"name":"caseInsensitiveAnalyzer",
"charFilters":[],
"tokenizer":"keyword_v2",
"tokenFilters":["lowercase"]
}],
"tokenizers": [],
"tokenFilters": [],
"charFilters": [],
"encryptionKey": null
}
UPLOAD
Upload three sample documents.
{
"value": [
{
"#search.action": "mergeOrUpload",
"Id": "1",
"Title": "Abc",
"TitleCaseInsensitive": "abc"
},
{
"#search.action": "mergeOrUpload",
"Id": "2",
"Title": "abc",
"TitleCaseInsensitive": "abc"
},
{
"#search.action": "mergeOrUpload",
"Id": "3",
"Title": "Bbc",
"TitleCaseInsensitive": "bbc"
}
]
}
QUERY
Then, query with $orderby on your lowercased (normalized) property.
https://{{SEARCH_SVC}}.{{DNS_SUFFIX}}/indexes/{{INDEX_NAME}}/docs?search=*&$count=true&$select=Id,Title,TitleCaseInsensitive&searchMode=all&queryType=full&api-version={{API-VERSION}}&$orderby=TitleCaseInsensitive asc
And you'll get the expected results where Title is sorted in a case-insensitive way.
{
"#odata.context": "https://<your-search-service>.search.windows.net/indexes('dg-test-65526118')/$metadata#docs(*)",
"#odata.count": 3,
"value": [
{
"#search.score": 1.0,
"Id": "2",
"TitleCaseInsensitive": "abc",
"Title": "abc"
},
{
"#search.score": 1.0,
"Id": "1",
"TitleCaseInsensitive": "abc",
"Title": "Abc"
},
{
"#search.score": 1.0,
"Id": "3",
"TitleCaseInsensitive": "bbc",
"Title": "Bbc"
}
]
}
I would love to be corrected with a simple way to accomplish this.
Please check out the Text normalization for case-insensitive filtering, faceting and sorting feature that's in Preview.
You can update your index to use this "normalizer" feature for the fields in which you'd like case-insensitive order-by operations.
You don't need a separate field TitleCaseInsensitive anymore. You can add "normalizer": "lowercase" to the Title field, and $orderBy=Title will sort in the order you'd like, ignoring casing.
The "lowercase" normalizer is pre-defined. If you'd like other filters to be applied, please look at predefined and custom normalizers
"index": {
"name": "current-local-inventory",
"fields": [
{"name": "Title", "type": "Edm.String", "filterable": true, "sortable": true, "facetable": false, "searchable": true, "normalizer":"lowercase"}
]
},
"indexers":[{
"fieldMappings" : [
{"sourceFieldName" : "Title", "targetFieldName" : "Title" }
]
}]

Azure Search on text files

Trying to setup Azure Search on blob container that has text files.
I have created a storage account and I'm using the Azure Data Lake gen2 stored in the blob container. I have one file as I'm testing the azure search. I have created the index, datasource and when I try to create the indexer I get
{"error":{"code":"","message":"Error with data source: The remote server returned an error: (400) Bad Request. Please adjust your data source definition in order to proceed."}}
My file has no headers, contains 16 columns, and is pipe delimited
so here is what I have tried for index
{
"name" : "test-index",
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{"name":"TransactionId", "type": "Edm.String", "key": false, "searchable": true },
{"name":"TransactionEventId", "type": "Edm.String", "key": false, "searchable": true },
{"name":"EventTypeId", "type": "Edm.String", "key": false, "searchable": true },
{"name":"EventSourceId", "type": "Edm.String", "key": false, "searchable": true },
{"name":"SourceUserId", "type": "Edm.String", "key": false, "searchable": true },
{"name":"SourceRecordId", "type": "Edm.String", "key": false, "searchable": true },
{"name":"SourceDetails", "type": "Edm.String", "key": false, "searchable": true },
{"name":"UserGlobalId", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"CallDistributorKey", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"CreatedDateTime", "type": "Edm.DateTimeOffset", "key": false, "searchable": false },
{"name":"AccountId", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"LobId", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"StartEvent", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"EndEvent", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"OnCall", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"PresenceEventId", "type": "Edm.Int32", "key": false, "searchable": false },
{"name":"EventProcessedUtcTime", "type": "Edm.DateTimeOffset", "key": false, "searchable": false }
]
}
---also tried something much simpler
"fields": [
{ "name": "id", "type": "Edm.String", "key": true, "searchable": false },
{ "name": "content", "type": "Edm.String", "key":false, "retrievable": false , "filterable": false, "sortable": false, "facetable":false, "searchable": true}
]
--datasource
{
"name" : "test-ds",
"type" : "azureblob",
"credentials" : { "connectionString" :"DefaultEndpointsProtocol=https;AccountName=......;AccountKey=..." },
"container" : { "name" : "test" }
}
--indexer
{
"name" : "test-indexer",
"dataSourceName" : "test-ds",
"targetIndexName" : "test-index"
}
---get error
{"error":{"code":"","message":"Error with data source: The remote server returned an error: (400) Bad Request. Please adjust your data source definition in order to proceed."}}
--tried this indexer create as well
{
"name" : "test-indexer",
"dataSourceName" : "test-ds",
"targetIndexName" : "test-index",
"parameters" : { "configuration" : { "parsingMode" : "delimitedText", "delimitedTextDelimiter" : "|" , "delimitedTextHeaders" : "TransactionId
TransactionEventId,EventTypeId,EventSourceId,SourceUserIdSourceRecordId,
SourceDetails,UserGlobalId,CallDistributorKey,CreatedDateTime,AccountId,
LobId,StartEvent,EndEvent,OnCall,PresenceEventId,EventProcessedUtcTime" } }
}
---get error
{"error":{"code":"","message":"Error with data source: The remote server returned an error: (400) Bad Request. Please adjust your data source definition in order to proceed."}}
Any pointers would be great....
This is an existing issue of API interoperability between Azure Data Lake Storage Gen2 and Blob Storage.
Azure Search uses Blob Storage APIs and these APIs are currently not compatible with hierarchical namespaces. You can disable the hierarchical namespaces feature to enable Azure Search indexing, but you'll lose some Azure Data Lake Storage Gen2 specific features.

Azure Search - unaccent

Trying to figure out how to get ignore accents ability in azure search. Texts in my application are in Polish lanuage. For searchable fields I tried to use pl.microsoft and pl.lucene analyzer. Both of them are able to change singular form to plural. What I'm not able to achive is ignoring accents. The only way that I found (How to ignore accents in Azure Search?) is to use standardasciifolding.lucene analyzer. It ignores accents but on the other hand doesn't change form from singular to plural. Is there any way to combine two analyzers?
Thanks
I think I found a solution, but I'm not sure if it's the simplest approach or maybe I complicated it too much:
{
"name": "test",
"fields": [
{
"name": "id",
"type": "Edm.String",
"searchable": false,
"filterable": false,
"retrievable": true,
"sortable": false,
"facetable": false,
"key": true,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": null
},
{
"name": "name",
"type": "Edm.String",
"searchable": false,
"filterable": true,
"retrievable": true,
"sortable": false,
"facetable": false,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": "pl_analyzer",
"analyzer": null
}
],
"scoringProfiles": [],
"defaultScoringProfile": "",
"corsOptions": null,
"suggesters": [],
"analyzers":[
{
"name":"pl_analyzer",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"polish_tokenizer",
"tokenFilters":[ "lowercase", "asciifolding" ]
}],
"tokenizers": [
{
"#odata.type": "#Microsoft.Azure.Search.MicrosoftLanguageStemmingTokenizer",
"name": "polish_tokenizer",
"isSearchTokenizer": true,
"language": "polish"
}
],
"tokenFilters": [],
"charFilters": []
}

Resources