How to use Split Skill in azure cognitive search? - azure-cognitive-search

I am new to Azure cognitive search. I have a docx file which is stored in azure blob storage.I am using #Microsoft.Skills.Text.SplitSkill to split the document into multiple pages(chunks).But when I index the output of this skill,I am getting the entire docx file content.how do I return the "pages" from the SplitSkill so that the user sees the portion of the original document that was found by their search instead of returning entire document?
Please assist me.Thank you in advance.

The split skill allows you to split text into smaller chunks/pages that can be then processed by additional cognitive skills.
Here is what a minimalistic skillset that does splitting and translation may look like:
"skillset": [
{
"#odata.type": "#Microsoft.Skills.Text.SplitSkill",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"defaultLanguageCode": "en",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "languageCode",
"source": "/document/language"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "mypages"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.TranslationSkill",
"name": "#2",
"description": null,
"context": "/document/mypages/*",
"defaultFromLanguageCode": null,
"defaultToLanguageCode": "es",
"suggestedFrom": "en",
"inputs": [
{
"name": "text",
"source": "/document/mypages/*"
}
],
"outputs": [
{
"name": "translatedText",
"targetName": "translated_text"
}
]
}
]
Note that the split skill generated a collection of text elements under the "\document\mypages" node in the enriched tree. Also not that by providing the context "\document\mypages*" to the translation skill, we are telling the translation skill to perform translation on "each page".
I should point out that documents will still be indexed at the document level though. Skillsets are not really built to "change the cardinality of the index". That said, a workaround for that may be to project each of the pages as separate elements into a knowledge store, and then create a separate index that is actually focused on indexing each page.
Learn more about the knowledge store projections here:
https://learn.microsoft.com/en-us/azure/search/knowledge-store-concept-intro

Related

Azure Cognitive Search query specific data source

I have a Azure Cognitive Search set up with two DataSources, two Indexers indexing those DataSources and one Index.
I'd like to be able to able to query/filter by DataSource. Is that possible?
This is certainly possible. You'd just need to find a way to get a field in the index that contains the name of the data source. The best way to do this depends on the data source you're using--for example, if you're using a SQL data source, you might just be able to edit the query to return the value whereas that wouldn't work for blob storage.
Another option that would work for all data sources would be to include a skillset with a conditional skill which you can use to set a default value for a field.
Off of #Derek Legenzoff serve this is how I did it...
Add all your datasources
Create skillsets for each one data source with the following skill
{
"#odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
"name": "#1",
"description": "",
"context": "/document",
"inputs": [
{
"name": "condition",
"source": "= true"
},
{
"name": "whenTrue",
"source": "= 'Production'" //name of your environment
},
{
"name": "whenFalse",
"source": "= ''"
}
],
"outputs": [
{
"name": "output",
"targetName": "Env"
}
]
}
Create single index for your data model and add a Env field, and it's filterable and queriable
Create indexers for each one of your data sources that point to the index from step 3 and datasource from step 1
IMPORTANT: make sure you indexers have the following code in the defnition.
"outputFieldMappings": [
{
"sourceFieldName": "/document/Env",
"targetFieldName": "Env"
}
],
This will connect the product of the skill to the index

Nested loops or Cartezian product of arrays in Azure ARM

I'm building an ARM(Azure Resource Manager) template to create multiple resources of the same type. Let's say metric alerts for SQL servers. I have:
3 severity levels: [1, 2, 3]
20 servers with names [sqlserver_1, sqlserver_2, ...]
3 metrics to monitor [memory, cpu load, number of connections]
Essentially, I need a total of 180 resources. Is there any way how I can build and with all possible combinations of these variables. I.e. for each of the servers, I need to monitor 3 metrics where each could trigger 3 possible alerts levels depending on the metric levels.
Naturally, I thought about a Cartesian product of these arrays and then a copy loop over it to fill the template attributes. However, it doesn't look like ARM supports this.
Is it the point where instead of using ARM I should think about writing a code-generator to create a template instead of trying to bend ARM json?
Regarding the issue, you can add the copy element to the resources section of your template. After doing that, you can dynamically set the number of resources to deploy. For more details, please refer to here and here.
For example
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"org": {
"type": "array",
"defaultValue": [
"contoso",
"fabrikam",
"coho"
]
}
},
"resources": [
{
"apiVersion": "2017-06-01",
"type": "Microsoft.Storage/storageAccounts",
"name": "[concat(parameters('org')[copyIndex()], uniqueString(resourceGroup().id))]",
"location": "[resourceGroup().location]",
"sku": {
"name": "Standard_LRS"
},
"kind": "Storage",
"properties": {},
"copy": {
"name": "storagecopy",
"count": "[length(parameters('org'))]"
}
}
],
"outputs": {}
}

How do I restrict medication annotations to a specific document section via IBM Watson Annotator for Clinical Data (ACD)

I’m using the IBM Watson Annotator for Clinical Data (ACD) API hosted in IBM Cloud to detect medication mentions within discharge summary clinic notes. I’m using the out-of-the-box medication annotator provided with ACD.
I’m able to detect and extract medication mentions, but I ONLY want medications mentioned within “DISCHARGE MEDICATIONS” or “DISCHARGE INSTRUCTIONS” sections.
Is there a way I can restrict ACD to only return medication mentions that appear within those two sections? I’m only interested in discharge medications.
For example, given the following contrived (non-PHI) text:
“Patient was previously prescribed cisplatin.DISCHARGE MEDICATIONS: 1. Aspirin 81 mg orally once daily.”
I get two medication mentions: one over “cisplatin” and another over “aspirin” - I only want the latter, since it appears within the “DISCHARGE MEDICATIONS” section.
Since the ACD medication annotator captures the section headings as part of the mention annotations that appear within a section, you can define an inclusive filter that checks for (1) the desired normalized section heading as well as (2) a filter that checks for the existence of the section heading fields in general, should a mention appear outside of any section and not include section header fields as part of the annotation. This will filter out any medication mentions from the ACD response that don't appear within a "DISCHARGE MEDICATIONS" section. I added a couple other related normalized section headings so you can see how that's done. Feel free to modify the sample below to meet your needs.
Here's a sample flow you can persist via POST /flows and then reference on the analyze call as POST /analyze/{flow_id} - e.g. POST /analyze/discharge_med_flow
{
"id": "discharge_med_flow",
"name": "Disharge Medications Flow",
"description": "Detect medication mentions within DISCHARGE MEDICATIONS sections",
"annotatorFlows": [
{
"flow": {
"elements": [
{
"annotator": {
"name": "medication",
"configurations": [
{
"filter": {
"target": "unstructured.data.MedicationInd",
"condition": {
"type": "all",
"conditions": [
{
"type": "all",
"conditions": [
{
"type": "match",
"field": "sectionNormalizedName",
"values": [
"Discharge medication",
"Discharge instructions",
"Medications on discharge"
],
"not": false,
"caseInsensitive": true,
"operator": "equals"
},
{
"type": "match",
"field": "sectionNormalizedName",
"operator": "fieldExists"
}
]
}
]
}
}
}
]
}
}
],
"async": false
}
}
]
}
See the IBM Watson Annotator for Clinical Data filtering docs for additional details.
Thanks

Only getting single word parameters from Alexa Skills Kit

I'm writing an Alexa Skill, and I can only get single word parameters into my code.
Here is the intent schema:
{
"intents": [
{
"intent": "HeroQuizIntent",
"slots": [
{
"name": "SearchTerm",
"type": "SEARCH_TERMS"
}
]
},
{
"intent": "HeroAnswerIntent",
"slots": [
{
"name": "SearchTerm",
"type": "SEARCH_TERMS"
}
]
},
{
"intent": "AMAZON.HelpIntent"
}
]
}
and my sample utterances are:
HeroQuizIntent quiz me
HeroAnswerIntent is it {SearchTerm}
For the HeroAnswerIntent, I'm checking the SearchTerm slot, and I'm only getting single words in there.
So, "Peter Parker" gives "Parker", "Steve Rogers" gives "Rogers", and "Tony Stark" gives "Stark".
How do I accept multiple words into a slot?
I've had same problem with my skill and that's the only solution which is worked for my skill to use several words, but you need to check are these slots not empty and concatenate them
Intent schema:
{
"intent": "HeroAnswerIntent",
"slots": [
{
"name": "SearchTermFirst",
"type": "SEARCH_TERMS"
},
{
"name": "SearchTermSecond",
"type": "SEARCH_TERMS"
},
{
"name": "SearchTermThird",
"type": "SEARCH_TERMS"
}
]
},
Sample utterance
HeroAnswerIntent is it {SearchTermFirst}
HeroAnswerIntent is it {SearchTermFirst} {SearchTermSecond}
HeroAnswerIntent is it {SearchTermFirst} {SearchTermSecond} {SearchTermThird}
And last one you need to put every of your words in separate line in SEARCH_TERMS slot definition
Also using AMAZON.LITERAL sometimes not pass variable into skill at all even if you test it using service simulator (skill console, test tab)
The solution #Xanxir mentioned works equivalently with the newer custom slots format. In this case, you'd just put multiple length examples in your custom list of values for your slot type.
I had to change the Slot type to AMAZON.LITERAL.
The trick was that in the sample utterances, I also had to provide multiple utterances to demonstrate the minimum and maximum sizes of literals that Alexa should interpret. It's wonky, but works.
Here's the reference for it: https://developer.amazon.com/public/solutions/alexa/alexa-skills-kit/docs/alexa-skills-kit-interaction-model-reference
AMAZON.SearchQuery
So you can use this in your utterances, and it will detect all words that the user speaks in between, Its rather accurate
It will solve your problem.
Ref Link: Alexa SearcQuery

searching an array deep inside a mongo document

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}
you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Resources