How to handle multi-word/phrase synonyms in Azure Search - azure-cognitive-search

According to article https://azure.microsoft.com/pl-pl/blog/azure-search-synonyms-public-preview/ I should be to use multi-word/phrase synonym in synonymMaps
Multi-word synonyms
In many full text search engines, support for synonyms is limited to single words. Our team has engineered a solution that allows Azure Search to support multi-word synonyms. This allows for phrase queries (“”) to function properly while using synonyms. If someone has mapped ‘hot tub’ to ‘whirlpool bath’ and they then search for “large hot tub,” Azure Search will return matches which contain both “large hot tub” and “large whirlpool bath.”
However, in my case I got match on sub words.
My synonymMap looks like:
{"name":"map",
"format":"solr",
"synonyms":"Gastroenterology (acute and chronic),vomiting, diarrhoea, weight loss\n"}
And I have documents in search index which contains medicine disciplines like Gastroenterology (acute and chronic).
What I receives after ?search="vomiting" is:
{
"#search.score": 1.0405536,
"#search.highlights": {
"disciplines/name": [
"<em>Acute</em> <em>and</em> <em>chronic</em> ear disease",
"<em>Acute</em> <em>and</em> <em>chronic</em> skin disease",
"<em>Gastroenterology</em> (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Haematology (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Respiratory medicine (<em>acute</em> <em>and</em> <em>chronic</em>)"
],
And I am expecting:
{
"#search.score": 1.0405536,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Gastroenterology (acute and chronic)</em>",
],
Am I doing something wrong?
I tried to cut main word to one-word like Gastroenterology but some of them simply cannot be cut.
Providing quotes like synonyms => "Gastroenterology (acute and chronic)" also does not work.
UPDATED
I was wondering why I thought there is problem.
Well, I provided:
{"name":"map",
"format":"solr",
"synonyms":"Gastroenterology (acute and chronic),vomiting, diarrhoea, weight loss\n"}
And actually using:
{"name":"map",
"format":"solr",
"synonyms":"Gastroenterology (acute and chronic),vomiting, diarrhoea, weight loss
=> Gastroenterology (acute and chronic)\n"}
In that case I vae 4 results:
"#odata.count": 4,
"value": [
{
"#search.score": 1.0137179,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Acute</em> <em>and</em> <em>chronic</em> ear disease",
"<em>Acute</em> <em>and</em> <em>chronic</em> skin disease",
"<em>Gastroenterology</em> (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Haematology (<em>acute</em> <em>and</em> <em>chronic</em>)",
"Respiratory medicine (<em>acute</em> <em>and</em> <em>chronic</em>)"
],
"equipment/translatedName": [
"Emergency <em>and</em> crictial care",
"In house skin <em>and</em> ear cyology"
],
"disciplines/translatedName": [
"Anaesthesia <em>and</em> analgesia",
"Emergency <em>and</em> critical care"
]
},
...
{
"#search.score": 0.33542877,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Chronic</em> pain management"
],
"disciplines/translatedName": [
"Anaesthesia <em>and</em> analgesia"
]
},
...
{
"#search.score": 0.13757591,
"#search.highlights": {
"equipment/translatedName": [
"Emergency <em>and</em> crictial care"
],
"disciplines/translatedName": [
"Emergency <em>and</em> critical care"
]
},
...
{
"#search.score": 0.07112321,
"#search.highlights": {
"disciplines/services/translatedName": [
"<em>Chronic</em> pain management"
]
},
Could you explain to me how it works in that case?

Azure Search does support multi-word synonyms and the result in your case is as expected. There are a couple of things to be called out here.
First ?search="vomiting" will return docs that match 'vomiting' or specified synonyms anywhere within the document. The multi-word synonym Gastroenterology (acute and chronic) in the collection disciplines/name matches your query, resulting the document to be returned.
The second thing that is probably the source of confusion, is the highlighting. Azure search doesn't support phrase highlighting currently. If used with a phrase query, it highlights the individual terms in the phrase. Since the matching document also had individual terms elsewhere, all of those were highlighted. Check Azure search highlights for phrases with double quotes for more details.
So, the multi-word synonym expansion and search is functioning as expected. You can test this by indexing a test document that just contains Gastroenterology (acute and chronic) and then another that just contains acute and chronic. The query should result only return the 1st document.
If you have a strict requirement on highlighting phrases, you'll have to do some client side processing after retrieving the search results

Related

PATCH request with operator "Remove" not getting sent when removing a member from a group

I am looking into Azure AD SCIM Provisioning and I have a question I am hoping I could get some help on. My use case is as follows
I created a Group in Azure AD and added "John Smith" and "Jane Smith" as members to it.
I went over to my Non-Gallery application added the Group created above to my application and triggered an On-Demand provisioning.
Both "John Smith" and "Jane Smith" were successfully created in my local database.
I removed "John Smith" from my group and triggered an On-Demand provisioning again.
My expectation was that the following PATCH request would be sent by Azure Ad
"Operations": [
{
"op": "Remove",
"path": "members",
"value": "john-smith-id"
}
]
but instead Azure AD sends a PATCH request to /Users with the following body
"schemas": [
"urn:ietf:params:scim:api:messages:2.0:PatchOp"
],
"Operations": [
{
"op": "Add",
"path": "displayName",
"value": "John Smith"
}
]
and another PATCH request to /Groups with the following body
"schemas": [
"urn:ietf:params:scim:api:messages:2.0:PatchOp"
],
"Operations": [
{
"op": "Add",
"path": "externalId",
"value": "some-guid"
}
]
Is this correct? I feel like I am messing something up when removing the member from the Group which isn't triggering the desired PATCH request
After step #4, I would recommend checking if the user has successfully been removed from the group.
Also, make sure that you're using the right rule ID in the on-demand provisioning request. One easy way to do this is to try through the UI and look at the network traffic ctrl+shift+i
The rule ID can be found in the schema.

Can we use Phonetic Analyzer and Synonym maps together in the index of Azure Cognitive Search?

I am trying to enable both the Phonetic analyzer and Synonym maps together in my search index. But when I worked with both, Synonym mapping is not working.
If I remove the phonetic analyzer as part of the index creation, then the synonyms are working fine.
Also, the synonyms work fine with the inbuilt analyzers like en.microsoft.
My Index field:
{
"name": "content",
"type": "Edm.String",
"facetable": false,
"filterable": false,
"key": false,
"retrievable": true,
"searchable": true,
"sortable": false,
"analyzer": "my_standard",
"indexAnalyzer": null,
"searchAnalyzer": null,
"synonymMaps": [
"euc-synonymmap"
]
}
Analyzer definition:
"analyzers":[
{
"name":"my_standard",
"#odata.type":"#Microsoft.Azure.Search.CustomAnalyzer",
"tokenizer":"standard_v2",
"tokenFilters":[ "lowercase", "asciifolding", "phonetic" ]
}
]
Synonym map:
{
"format":"solr",
"synonyms" : "features,characteristic,property,detail,facet,factor\n
configure,setup,install,launch,arange\n
issue,problem,controversy,affair\n
troubleshoot,fix,correct,fine-tune,over haul\n
extension,postponement,postpone,delay,addition,add-on\n
computer,desktop,system,laptop,mainframe,machine,PC,Workstation\n
temp,temporary,momentary,short lived"
}
Search query:
{
"search"="Lane acount postponement",
"top":"5",
"highlight":"content"
}
Consider I have a document which is having content with terms 'LAN','account' and 'Extension' that is been indexed in azure index(with phonetic analyzer and synonyms). When I pass search query as "Lane acount postponement", phonetic analyzer analyze 'Lane' term as LAN and 'acount' term as 'account'. It is also highlighting the terms(LAN , account) from document's content since I am using hit highlighting in a search query.
You can see in the synonym map definition,i added extension as a synonym for postponement . But it is not searching and highlighting the term 'extension'.
I just need to know whether we can use both the Phonetic analyzer and Synonym maps together in a search index.
Please clarify me. Thank you in advance.

Solr returns different result for each letter change

When I try searching for products that are having "camel" in their display names. All the indexing procedure have been done. The problem here is:
When I search "camel" I get: 1 product
"name": "CHANEL HYDRA BEAUTY CAMELLIA WATER CREAM ILLUMINATING HYDRATING FLUID 30ML"
But When I search "CAMELL": I get 3 products from solr:
{
"name": "CLE DE PEAU Lipstick #5 Camellia"
},
{
"name": "CHANEL HYDRA BEAUTY CAMELLIA WATER CREAM ILLUMINATING HYDRATING FLUID 30ML"
},
{
"name": "HERA Rouge Holic Shine No.315 Camellia Orange"
}
When I search CAMEL. I must have got these 3 as well. Why isn't it working?
The issue was fixed after setting the wildcard flag as true to the indexed properties, in Hybris. Thanks to everyone for your help and ideas.

Solr Faceting - simple example complaining about asterisk

I'm doing the most basic of solr queries with faceting.
q=*:*&facet=true&facet.field=year
And I'm getting an error as follows:
{
"responseHeader": {
"status": 400,
"QTime": 1,
"params": {
"indent": "true",
"q": "*:*&facet=true&facet.field=year",
"_": "1443134591151",
"wt": "json"
}
},
"error": {
"msg": "undefined field *",
"code": 400
}
}
This query is straight out of the online tutorials. Why is solr complaining?
It appears that what you have done is gone to the Solr Admin panel and in the query section you have put
*:*&facet=true&facet.field=year
after the q. What you need to do is put *:* after the q, and facet=true&facet.field=year under Raw Query Parameters.
The error says, that you have "undefined field". Is "year" field defined in your schema? Also, can you give details about how you are querying the data. Like which client?And I assume that q=: is working and issue is only with faceting
You've put it into the wrong line in the solr admin.
Just take the same line, and paste it into the Raw query line instead of the query line.

Lily with Morphline and HBase

I'm trying to use an tutorial from Cloudera. (http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/search_hbase_batch_indexer.html)
I have a code to insert objects in Avro format in HBase and I want to insert them to Solr but I don't get anything.
I have been taking a look to the logs:
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={0Name178721/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
15/06/12 00:45:00 DEBUG indexer.Indexer$RowBasedIndexer: Indexer _default_ will send to Solr 0 adds and 0 deletes
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeNotify: {lifecycle=[START_SESSION]}
15/06/12 00:45:00 TRACE morphline.ExtractHBaseCellsBuilder$ExtractHBaseCells: beforeProcess: {_attachment_body=[keyvalues={1Name134339/data:avroUser/1434094131495/Put/vlen=237/seqid=0}], _attachment_mimetype=[application/java-hbase-result]}
So, I'm reaing them but I don't know why it isn't indexed anything in Solr.
I guess that my morphline.conf is wrong.
morphlines : [
{
id : morphline1
importCommands : ["org.kitesdk.**", "org.apache.solr.**", "com.ngdata.**"]
commands : [
{
extractHBaseCells {
mappings : [
{
inputColumn : "data:avroUser"
outputField : "_attachment_body"
type : "byte[]"
source : value
}
]
}
}
#for avro use with type : "byte[]" in extractHBaseCells mapping above
{ readAvroContainer {} }
{
extractAvroPaths {
flatten : true
paths : {
name : /name
}
}
}
{ logTrace { format : "output record: {}", args : ["#{}"] } }
]
}
]
I wasn't sure if I had to have an "_attachment_body" field in Solr, but it seems that it isn't necessary, so I guess that readAvroContainer or extractAvroPaths are wrong.
I have a "name" field in Solr and my avroUser has a "name" field as well.
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
I have all this things working well here.
I did this steps:
1) Install hbase-solr-indexer as a service:
Fist of all you have to install hbase-solr-indexer.
installing hbase-solr-indexing as a service
Add cloudera repos to yum repos for this.
After that type:
sudo yum install hbase-solr-indexer
2) Criate morphline files:
ok, you did it.
2) Set the Replication scope for every column family and register a hbase-indexer configuration
Using the Lily HBase NRT Indexer Service
$ hbase shell
hbase shell> disable 'record'
hbase shell> alter 'record', {NAME => 'data', REPLICATION_SCOPE => 1}
hbase shell> enable 'record'
Try to follow the others tutorials above. ;)
I was with problems with a NRT solution, but when I followed all that tutorial step by step It worked.
I hope this help someone.

Resources