Azure Cognitive Search query specific data source - azure-cognitive-search

I have a Azure Cognitive Search set up with two DataSources, two Indexers indexing those DataSources and one Index.
I'd like to be able to able to query/filter by DataSource. Is that possible?

This is certainly possible. You'd just need to find a way to get a field in the index that contains the name of the data source. The best way to do this depends on the data source you're using--for example, if you're using a SQL data source, you might just be able to edit the query to return the value whereas that wouldn't work for blob storage.
Another option that would work for all data sources would be to include a skillset with a conditional skill which you can use to set a default value for a field.

Off of #Derek Legenzoff serve this is how I did it...
Add all your datasources
Create skillsets for each one data source with the following skill
{
"#odata.type": "#Microsoft.Skills.Util.ConditionalSkill",
"name": "#1",
"description": "",
"context": "/document",
"inputs": [
{
"name": "condition",
"source": "= true"
},
{
"name": "whenTrue",
"source": "= 'Production'" //name of your environment
},
{
"name": "whenFalse",
"source": "= ''"
}
],
"outputs": [
{
"name": "output",
"targetName": "Env"
}
]
}
Create single index for your data model and add a Env field, and it's filterable and queriable
Create indexers for each one of your data sources that point to the index from step 3 and datasource from step 1
IMPORTANT: make sure you indexers have the following code in the defnition.
"outputFieldMappings": [
{
"sourceFieldName": "/document/Env",
"targetFieldName": "Env"
}
],
This will connect the product of the skill to the index

Related

Nested loops or Cartezian product of arrays in Azure ARM

I'm building an ARM(Azure Resource Manager) template to create multiple resources of the same type. Let's say metric alerts for SQL servers. I have:
3 severity levels: [1, 2, 3]
20 servers with names [sqlserver_1, sqlserver_2, ...]
3 metrics to monitor [memory, cpu load, number of connections]
Essentially, I need a total of 180 resources. Is there any way how I can build and with all possible combinations of these variables. I.e. for each of the servers, I need to monitor 3 metrics where each could trigger 3 possible alerts levels depending on the metric levels.
Naturally, I thought about a Cartesian product of these arrays and then a copy loop over it to fill the template attributes. However, it doesn't look like ARM supports this.
Is it the point where instead of using ARM I should think about writing a code-generator to create a template instead of trying to bend ARM json?
Regarding the issue, you can add the copy element to the resources section of your template. After doing that, you can dynamically set the number of resources to deploy. For more details, please refer to here and here.
For example
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentTemplate.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"org": {
"type": "array",
"defaultValue": [
"contoso",
"fabrikam",
"coho"
]
}
},
"resources": [
{
"apiVersion": "2017-06-01",
"type": "Microsoft.Storage/storageAccounts",
"name": "[concat(parameters('org')[copyIndex()], uniqueString(resourceGroup().id))]",
"location": "[resourceGroup().location]",
"sku": {
"name": "Standard_LRS"
},
"kind": "Storage",
"properties": {},
"copy": {
"name": "storagecopy",
"count": "[length(parameters('org'))]"
}
}
],
"outputs": {}
}

Best Firestore data model for file tree data

I'm currently building an app that needs to store data with a similar structure as a file tree. It looks something like this:
{
"type": "folder",
"name": "folder A",
"private": false,
"updatedAt": 1231243,
"items": [
{
"type": "folder",
"name": "subfolder A",
"private": false,
"updatedAt": 1231243,
"items": [
{
"type": "file",
"name": "file1"
}
]
},
{
"type": "file",
"name": "file2"
}
]
}
What I'm aware so far, there are 3 ways of implementing this.
Just dump all the data in 1 doc
Create a subcollection for each items array
Create a flat folder structure and save the parent folder id for lookup
I'm looking for a way to get all of this data with as minimum query as possible, the ideal use case would be only having the root folder id and getting all the subfolder & items. But I'm not sure if that's possible.
Also, I'm planning to subscribe to the data in the future, so the file tree will be updated in real-time.
Please give me a suggestion on what should the data model looks like
Update 1
To give more clarification about the query that I will need:
I will only need to get the topmost folder (folder A) in this example and the items under it
I won't need to get the nested items directly
example: getting Subfolder A / File 2 without accessing Folder A
If all you need is to get the entire structure by some unique ID, just put it all in one document, and get() it using the known ID. You have a 1MB limit per document.
It can get more complicated if you need to access items within lists (which is not really possible to do without reading the entire document), so this would actually be a bad idea for that case.
You would also want to reconsider this if you intend to update the elements of this document frequently, as there is a limit of 1 write per second (sustained). You would want to split it up otherwise.

How to use Split Skill in azure cognitive search?

I am new to Azure cognitive search. I have a docx file which is stored in azure blob storage.I am using #Microsoft.Skills.Text.SplitSkill to split the document into multiple pages(chunks).But when I index the output of this skill,I am getting the entire docx file content.how do I return the "pages" from the SplitSkill so that the user sees the portion of the original document that was found by their search instead of returning entire document?
Please assist me.Thank you in advance.
The split skill allows you to split text into smaller chunks/pages that can be then processed by additional cognitive skills.
Here is what a minimalistic skillset that does splitting and translation may look like:
"skillset": [
{
"#odata.type": "#Microsoft.Skills.Text.SplitSkill",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"defaultLanguageCode": "en",
"inputs": [
{
"name": "text",
"source": "/document/content"
},
{
"name": "languageCode",
"source": "/document/language"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "mypages"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.TranslationSkill",
"name": "#2",
"description": null,
"context": "/document/mypages/*",
"defaultFromLanguageCode": null,
"defaultToLanguageCode": "es",
"suggestedFrom": "en",
"inputs": [
{
"name": "text",
"source": "/document/mypages/*"
}
],
"outputs": [
{
"name": "translatedText",
"targetName": "translated_text"
}
]
}
]
Note that the split skill generated a collection of text elements under the "\document\mypages" node in the enriched tree. Also not that by providing the context "\document\mypages*" to the translation skill, we are telling the translation skill to perform translation on "each page".
I should point out that documents will still be indexed at the document level though. Skillsets are not really built to "change the cardinality of the index". That said, a workaround for that may be to project each of the pages as separate elements into a knowledge store, and then create a separate index that is actually focused on indexing each page.
Learn more about the knowledge store projections here:
https://learn.microsoft.com/en-us/azure/search/knowledge-store-concept-intro

How to get Resource Group name from within Logic App

In an Azure Logic App, how can I get the name of the Resource Group containing the current logic app?
I want to include some tracking details in the JSON output that I am sending to another system.
I can get the run Identifier ( using #{workflow()['run']['name']} ),
and the current logic app name ( using #{workflow()['name']} )
However, I cant work out how to get the name of the resource group to which the logic app is deployed.
As a last resort, I will use the resource group name used by the deployment template, but that will be wrong if the logic app is moved later.
I could also use tags, but again that could get out of step if the logic app is moved.
Thanks
A simple formula may be:
split(workflow().id, "/")[4]
If you're deploying the Logic Apps using ARM templates (e.g. edit in Visual Studio, check into Azure DevOps git repo and deploy using release pipeline), you can create an ARM parameter:
"resGroup_ARM": {
"type": "string",
"defaultValue": "[resourceGroup().name]",
"metadata": {
"description": "Resouce group name"
}
}
Then, you can create a workflow parameter:
"resGroup_LA": {
"type": "string",
"defaultValue": "ResGroup LA default"
}
... and give it a value in the parameters initialisation section:
"resGroup_LA": {
"value": "[parameters('resGroup_ARM')]"
}
You can get all the other properties of resourceGroup() in a similar manner, see: https://learn.microsoft.com/en-us/azure/azure-resource-manager/templates/template-functions-resource?tabs=json#resourcegroup
First we can create a "Initialize variable" action to get all of the data in workflow, shown as below screenshot:
Then we can find the data in workflow is:
{
"id": "/subscriptions/*****/resourceGroups/huryTest/providers/Microsoft.Logic/workflows/hurylogicblob",
"name": "hurylogicblob",
"type": "Microsoft.Logic/workflows",
"location": "eastus",
"tags": {},
"run": {
"id": "/subscriptions/*****/resourceGroups/huryTest/providers/Microsoft.Logic/workflows/hurylogicblob/runs/*****",
"name": "*****",
"type": "Microsoft.Logic/workflows/runs"
}
}
It contains the resource group name, so we just need to get the property "id" and substring it to get resource group name. The length of "resourceGroups/" is 15, so in the expression below I use add(,15) and sub(,15).
You can use the expression as below:
substring(workflow()['id'],add(indexOf(workflow()['id'],'resourceGroups/'),15),sub(sub(indexOf(workflow()['id'],'/providers'),indexOf(workflow()['id'],'resourceGroups/')),15))
At last, I got the resource group name of the logic app:

searching an array deep inside a mongo document

in my mongo collection called pixels, I have documents like the sample
I'm looking for a way to search in the actions.tags part of the documents?
db.pixelsactifs.actions.find({tags:{$in : ["Environnement"]}})
db.pixelsactifs.find({actions.tags:{$in : {Environnement}})
doesn't work. I'm also looking for the PHP equivalent ?
I'm also asking myself should I make an "actions" collection instead of putting everything inside one document
I'm new to mongo so any good tutorial on structuring the db would be great
Thanks for the insight
{
"_id": { $oid": "51b98009e4b075a9690bbc71" },
"name": "open Atlas",
"manager": "Tib Kat",
"type": "Association",
"logo": "",
"description": "OPEN ATLAS",
"actions": [
{
"name": "Pixel Humain",
"tags": [ "Toutes thémathiques" ],
"description": "le PH agit localement",
"images": [],
"origine": "oui",
"website": "www.echolocal.org"
}
],
"email": "my#gmail.com",
"adress": "102 rue",
"cp": "97421",
"city": "Saint louis",
"country": "Réunion",
"phone": "06932"
}
you can try like this
collectionName->find(array("actions.tags" => array('$in' => "Environnement")));
I do not think you need to maintain the actions in separate collection. NoSQL gives you more flexibility to do embed th document . Event it allows sub document also be indexed . True power of NoSQL comes with merging the document into each other to get the faster retrieval. The only short coming I can see here , you can not get the part of sub document . find will always return the complete Parent document. In case you want to show one entry of subdocument array , it is not possible . It will return the whole subdocument and you have to filter in on the client side. So if you are planning to show action as individual to end user , it is better to have in separate collection
Read here : http://docs.mongodb.org/manual/use-cases/

Resources