Prioritize search results based on certain parameter in AzureSearch. - azure-cognitive-search

I have an index on AzureSearch similar to this one:
"fields": [
{
"name": "key",
"type": "Edm.String",
"filterable": true,
},
{
"name": "title",
"type": "Edm.String",
"searchable": true
},
{
"name": "followers",
"type": "Collection(Edm.String)",
"filterable": true,
}
]
Here, title is the title of a Post and its text searchable. followers contains the user ids of users who are following that particular Post.
I am getting current logged in userId from session. Now when a user does some text search, I want to show those Posts on top which current user is following.
Please tell if this is achievable in AzureSearch using ScoringProfiles or anything else?

Tag boosting in ScoringProfile does exactly that. All you need to do is to add a scoring profile as below :
{
"scoringProfiles": [
  {
    "name": "personalized",
    "functions": [
    {
      "type": "tag",
      "boost": 2,
      "fieldName": "followers",
      "tag": { "tagsParameter": "follower" }
    }
    ]
  }
  ]
}
Then, at query time, issue a search query with the scoring profile with the parameters to customize the ranking :
docs?search=some%20post&&scoringProfile=personalized&scoringParameter=follower:user_abc
Hope this helps. You can read more about it here.
https://azure.microsoft.com/en-us/blog/personalizing-search-results-announcing-tag-boosting-in-azure-search/
Nate

Related

Azure Cognitive Search Complex object filtering

I have an index with azure cognative search but cant seem to find the right syntax to query it for what I need.
I have documents that looks like the below and want to be able to pass in a search for "black denim shirt" and have that matched against each item object in the document rather than the whole document.
I need this match to be confined to the objects as I don't want the "black" and "denim" from the "black denim shirt" query to be matched to a "black denim jeans". Therefore the match/higher ranked result should be Document 2
Document 1:
{
"id": "Style1",
"itemKeyWords": [
{
"productKeyWords": "shirt,oversized shirt,denim",
"attributeKeyWords": "blue"
},
{
"productKeyWords": "Skinny, denim, jeans",
"attributeKeyWords": "black"
}
]
}
Document 2:
{
"id": "Style2",
"itemKeyWords": [
{
"productKeyWords": "shirt,oversized shirt,denim",
"attributeKeyWords": "black"
},
{
"productKeyWords": "Skinny, denim, jeans",
"attributeKeyWords": "blue"
}
]
}
I have the itemKeyWords set up in the index as a
{
"name": "itemKeyWords",
"type": "Collection(Edm.ComplexType)",
"fields": [
{
"name": "productKeyWords",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": false,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "en.lucene",
"normalizer": null,
"synonymMaps": []
},
{
"name": "attributeKeyWords",
"type": "Edm.String",
"searchable": true,
"filterable": true,
"retrievable": true,
"sortable": false,
"facetable": true,
"key": false,
"indexAnalyzer": null,
"searchAnalyzer": null,
"analyzer": "en.lucene",
"normalizer": null,
"synonymMaps": []
}
]
}
I have tried various attempts using this as a guid but cant seem to get the syntax right
https://learn.microsoft.com/en-gb/azure/search/search-howto-complex-data-types?tabs=portal
Unfortunately, as of today, it is not possible to make "search" requests (queries that rely on the tokenized content) that enforce the requirement to have all matches within a specific entry of a complex object collection. This is only supported for filters right now (as long as the filter does not rely on the search.in function).
I can think of two (less than idea) work around:
Index each entry of the collection as separate documents
Flatten the sub-fields into a single field:
AggregateField: "Skinny, denim, jeans. black"
And then emit a query that use proximity search (to make sure all terms are within a certain distance):
queryType=full&search="black denim jeans"~5
If it's important for you to still keep the structured version of the content in the document (attribute and keywords separately), you can still index them along side the aggregated field for retrieval purpose (you can target different fields for matching purpose vs the one you actually return in the response by using select and searchFields)
queryType=full&search="black denim jeans"~3&searchFields=aggregatedFields&select=productKeyWords, attributeKeyWords
or
queryType=full&search=aggregatedFields:"black denim jeans"~3&select=productKeyWords,attributeKeyWords

Querying editable user calendars returns non-editable calendars

When querying editable user calendars, i.e. using the canEdit+eq+true OData filter clause, I am receiving the non-editable calendars instead.
Here down the REST query endpoint ({userId} replaced with any existing user GUID):
https://graph.microsoft.com/v1.0/users/{userId}/calendars?$filter=canEdit+eq+true
and here is the response result:
{
"#odata.context": "https://graph.microsoft.com/v1.0/$metadata#users('{userId}')/calendars",
"value": [
{
"id": "some-id",
"name": "Jours fériés - France",
"canEdit": false,
},
{
"id": "some-other-id",
"name": "Anniversaires",
"canEdit": false,
}
]
}
When querying the reverse property, i.e. the non-editable calendars, I receive the editable calendars as response payload:
Here down the REST query:
https://graph.microsoft.com/v1.0/users/{userId}/calendars?$filter=canEdit+eq+false
And below is the response result:
{
"#odata.context": "https://graph.microsoft.com/v1.0/$metadata#users('{userId}')/calendars",
"value": [
{
"id": "some-id",
"name": "Calendar",
"canEdit": true,
}
]
}
Note that I omitted irrelevant fields from both response results.
Is there a know issue or am I mis-understating the canEdit property?
Well, it's known issue according this.
Workaround can be to retrieve the editable calendars by using the canEdit eq false.
But I'm little bit affraid of it.

Manipulate field value of copy-field in Apache Solr

I have a simple string "PART_NUMBER" value as a field in solr. I would like to add an additional field which places that value in a URL field. To do this, I created a new field type, field, and copy field
"add-field-type": {
"name": "endpoint_url",
"class": "solr.TextField",
"positionIncrementGap": "100",
"analyzer": {
"tokenizer": {
"class": "solr.KeywordTokenizerFactory"
},
"filters": [
{
"class": "solr.PatternReplaceFilterFactory",
"pattern": "([\\s\\S]*)",
"replacement": "http://myurl/$1.jpg"
}
]
}
},
"add-field": {
"name": "URL",
"type": "endpoint_url",
"stored": true,
"indexed": true
},
"add-copy-field":{ "source":"PART_NUMBER", "dest":"URL" }
As some of you probably guessed, my query output looks like
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "ABCD1234",
"_version_": 1645658574812086272
}
Because the endpoint_url fieldtype only modifies the index. Indeed, when doing my analysis, I get
http://myurl/ABCD1234.jpg
My question: Is there any way to apply a tokenizer or filter and feed it back in to the field value? I would prefer this output when returning the result:
{
"id": "1",
"PART_NUMBER": "ABCD1234",
"URL": "http://myurl/ABCD1234.jpg",
"_version_": 1645658574812086272
}
Is this possible to do in Solr?
Solution was posted here:
Custom Solr analyzers not being used during indexing
I need to use an Update Processors In order to change the field value before analysis. The process can be found here:
https://lucene.apache.org/solr/guide/8_1/update-request-processors.html

Error code: InvalidIntentSamplePhraseSlot -

I got the error code Error code: InvalidIntentSamplePhraseSlot when I built the model using the new skills console.
The full error message is
Sample utterance "AddBookmarkIntent i am at {pageno} of {mybook}" in intent "AddBookmarkIntent" cannot include both a phrase slot and another intent slot. Error code: InvalidIntentSamplePhraseSlot -
where {pageno} is AMAZON.NUMBER and {mybook} is AMAZON.SearchQuery
What is the error about and how can I solve it?
edit: add the JSON for the intent
{
"name": "AddBookmarkIntent",
"slots": [
{
"name": "mybook",
"type": "AMAZON.SearchQuery"
},
{
"name": "pageno",
"type": "AMAZON.NUMBER"
}
],
"samples": [
"i am at {pageno} of the book {mybook}",
"save page {pageno} to the book {mybook}",
"save page {pageno} to {mybook}",
"i am at {pageno} of {mybook}"
]
}
It's not allowed to have a slot of the type AMAZON.SearchQuery in the same Utterance with another slot, in your case AMAZON.NUMBER.
Mark one of the slots as required and ask for them separately.
A little example:
Create the Intent put in the utterances and slots:
"intents": [
{
"name": "AddBookmarkIntent",
"samples": [
"I am at {pageno}"
],
"slots": [
{
"name": "mybook",
"type": "AMAZON.SearchQuery",
"samples": [
"For {mybook}"
]
},
{
"name": "pageno",
"type": "AMAZON.NUMBER"
}
]
}
Mark the specific slot as required so Alexa will automatically ask for it:
"dialog": {
"intents": [
{
"name": "AddBookmarkIntent",
"confirmationRequired": false,
"prompts": {},
"slots": [
{
"name": "mybook",
"type": "AMAZON.SearchQuery",
"elicitationRequired": true,
"confirmationRequired": false,
"prompts": {
"elicitation": "Elicit.Intent-AddBookmarkIntent.IntentSlot-mybook"
}
}
]
}
]
}
and create the prompts to ask for the slot:
"prompts": [
{
"id": "Elicit.Intent-AddBookmarkIntent.IntentSlot-mybook",
"variations": [
{
"type": "PlainText",
"value": "For which book you like to save the page?"
}
]
}
]
This is probably much easier with the skill builder BETA and not its editor because it will automatically create the JSON in the background.
The error is telling you that you have an Intent name in your Sample Utterance where it should only have Slots and it looks like you do.
"AddBookmarkIntent i am at {pageno} of {mybook}"
"AddBookmarkIntent" shouldn't actually be inside of the utterance. So turn your utterance into:
"i am at {pageno} of {mybook}"
I know that some of the documents show an example of the sample utterances with the Intent Name first, such as here. But that has a big warning near the top:
So you have to be careful about which documents you read and follow based on which way you are building your Alexa Skill.
Follow this if you are using the Skill Builder.
It unfortunately seems like an utterance can only reference 1 "Phrase" slot type.
For your specific case, it does look like there is now a non-phrase slot type AMAZON.Book in public beta; if you use that instead of AMAZON.SearchQuery it might work?
Src: https://developer.amazon.com/en-US/docs/alexa/custom-skills/slot-type-reference.html

Elasticsearch not returning hits for multi-valued field

I am using Elasticsearch with no modifications whatsoever. This means the mappings, norms, and analyzed/not_analyzed is all default config. I have a very small data set of two items for experimentation purposes. The items have several fields but I query only on one, which is a multi-valued/array of strings field. The doc looks like this:
{
"_index": "index_profile",
"_type": "items",
"_id": "ega",
"_version": 1,
"found": true,
"_source": {
"clicked": [
"ega"
],
"profile_topics": [
"Twitter",
"Entertainment",
"ESPN",
"Comedy",
"University of Rhode Island",
"Humor",
"Basketball",
"Sports",
"Movies",
"SnapChat",
"Celebrities",
"Rite Aid",
"Education",
"Television",
"Country Music",
"Seattle",
"Beer",
"Hip Hop",
"Actors",
"David Cameron",
... // other topics
],
"id": "ega"
}
}
A sample query is:
GET /index_profile/items/_search
{
"size": 10,
"query": {
"bool": {
"should": [{
"terms": {
"profile_topics": [
"Basketball"
]
}
}]
}
}
}
Again there are only two items and the one listed should match the query because the profile_topics field matches with the "Basketball" term. The other item does not match. I only get a result if I ask for clicked = ega in the should.
With Solr I would probably specify that the fields are multi-valued string arrays and are to have no norms and no analyzer so profile_topics are not stemmed or tokenized since all values should be treated as tokens (even the spaces). Not sure this would solve the problem but it is how I treat similar data on Solr.
I assume I have run afoul of some norm/analyzer/TF-IDF issue, if so how do I solve this so that even with two items the query will return ega. If possible I'd like to solve this index or type wide rather than field specific.
Basketball (with capital B) in terms will not be analyzed. This means this is the way it will be searched in the Elasticsearch index.
You say you have the defaults. If so, indexing Basketball under profile_topics field means that the actual term in the index will be basketball (with lowercase b) which is the result of the standard analyzer. So, either you set profile_topics as not_analyzed or you search for basketball and not Basketball.
Read this about terms.
Regarding to setting all the fields to not_analyzed you could do that with a dynamic template. Still with a template you can do what Logstash is doing: defining a .raw subfield for each string field and only this subfield is not_analyzed. The original/parent field still holds the analyzed version of the same text, maybe you will use in the future the analyzed field.
Take a look at this dynamic template. It's the one Logstash is using.
More specifically:
{
"template": "your_indices_name-*",
"mappings": {
"_default_": {
"_all": {
"enabled": true,
"omit_norms": true
},
"dynamic_templates": [
{
"string_fields": {
"match": "*",
"match_mapping_type": "string",
"mapping": {
"type": "string",
"index": "analyzed",
"omit_norms": true,
"fields": {
"raw": {
"type": "string",
"index": "not_analyzed"
}
}
}
}
}
]
}
}
}

Resources