Azure Search - Cannot merge (with skill) data obtained from the KeyPhraseExtractionSkill - azure-cognitive-search

I am creating an indexer that takes a document, runs the KeyPhraseExtractionSkill and outputs it back to the index.
For many documents, this works out of the box. But for those records which are over 50,000, this does not work. OK, no problem; this is clearly stated in the docs.
What the docs suggest is so use the Text Split Skill. What I've done is use the Text Split skill, split the original document into pages, pass all pages to the KeyPhraseExtractionSkill. Then we need to merge them back, as we'd end up with an array of arrays of strings. Unfortunately, it seems that the Merge Skill does not accept an array of arrays, just an array. <- Link to the skillset hierarchy.
This is the error reported by Azure:
Required skill input was not of the expected type 'StringCollection'. Name: 'itemsToInsert', Source: '/document/content/pages/*/keyPhrases'. Expression language parsing issues:
What I want to achieve in the end of the day is to run the KeyPhraseExtractionSkill for text which is larger than 50,000 to add it back to the index eventually.
JSON for skillset
"#odata.context": "$metadata#skillsets/$entity",
"#odata.etag": "\"0x8D957466A2C1E47\"",
"name": "devalbertcollectionfilesskillset2",
"description": null,
"skills": [
"#odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "SplitSkill",
"description": null,
"context": "/document/content",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"inputs": [
"name": "text",
"source": "/document/content"
"outputs": [
"name": "textItems",
"targetName": "pages"
"#odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"name": "EntityRecognitionSkill",
"description": null,
"context": "/document/content/pages/*",
"categories": [
"defaultLanguageCode": "en",
"minimumPrecision": null,
"includeTypelessEntities": null,
"inputs": [
"name": "text",
"source": "/document/content/pages/*"
"outputs": [
"name": "persons",
"targetName": "people"
"name": "organizations",
"targetName": "organizations"
"name": "entities",
"targetName": "entities"
"name": "locations",
"targetName": "locations"
"#odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"name": "KeyPhraseExtractionSkill",
"description": null,
"context": "/document/content/pages/*",
"defaultLanguageCode": "en",
"maxKeyPhraseCount": null,
"modelVersion": null,
"inputs": [
"name": "text",
"source": "/document/content/pages/*"
"outputs": [
"name": "keyPhrases",
"targetName": "keyPhrases"
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "Merge Skill - keyPhrases",
"description": null,
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
"name": "itemsToInsert",
"source": "/document/content/pages/*/keyPhrases"
"outputs": [
"name": "mergedText",
"targetName": "keyPhrases"
"cognitiveServices": {
"#odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
"key": "------",
"description": "/subscriptions/13abe1c6-d700-4f8f-916a-8d3bc17bb41e/resourceGroups/mde-dev-rg/providers/Microsoft.CognitiveServices/accounts/mde-dev-cognitive"
"knowledgeStore": null,
"encryptionKey": null
Please let me know if there is anything else that I can add to improve the question. Thanks!

You don't have to merge the key phrase outputs to insert them to the index.
Assuming your index already has a field called mykeyphrases of type Collection(Edm.String), to populate it with the key phrase outputs, add this indexer output field mapping:
"outputFieldMappings": [
"sourceFieldName": "/document/content/pages/*/keyPhrases/*",
"targetFieldName": "mykeyphrases"
The /* at the end of sourceFieldName is important to flattening the array of arrays of strings. This will also work as the skill input if you want to pass an array of strings to another skill for other enrichments.


Azure Data Factory - How to transform object with dynamic keys to array in a data flow?

After spending many hours of reading the documentation, following some tutorials and trial & error, i just can't figure it out; how can I transform the following complex object with key objects to an array using a data flow in Azure Data Factory?
"headers": {
"Content-Length": 1234
"body": {
"00b50a39-8591-3db3-88f7-635e2ec5c65a": {
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 1",
"date": "2023-02-09"
"0c206312-2348-391b-99f0-261323a94d95": {
"id": "0c206312-2348-391b-99f0-261323a94d95",
"name": "Example 2",
"date": "2023-02-09"
"0c82d1e4-a897-32f2-88db-6830a21b0a43": {
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 3",
"date": "2023-02-09"
Expected output
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 1",
"date": "2023-02-09"
"id": "0c206312-2348-391b-99f0-261323a94d95",
"name": "Example 2",
"date": "2023-02-09"
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 3",
"date": "2023-02-09"
AFAIK, Your JSON keys are dynamic. So, getting the desired result using dataflow might not be possible.
In this case, you can try the below approach as a workaround. This will work only if all of your key's length is same.
This is my Pipeline:
First I have used a lookup activity to get the JSON file and converted the lookup output to a string and stored in a variable using below expression.
Then I have used split on that String variable with '},"' and stored in an array variable using below expression.
It will give the array like below.
Give that array to a ForEach and inside ForEach use an append variable activity to store the keys into an array with below expression.
#take(item(), 36)
Now, I got the list of keys in an array, after the above ForEach use another ForEach activity to get the desired array of objects. Use append variable actvity inside ForEach and give the below expression for it.
Result array after ForEach will be:
If you want to store the above JSON into a file, you need to use OPENJSON from SQL. This is because copy activity additonal column only supports string type not an array type.
Use a SQL dataset on copy activity source and give the below SQL script in the query.
SET #json =
OPENJSON ( #json )
id varchar(200) '$.id' ,
name varchar(32) '$.name',
date varchar(32) '$.date'
In Sink, give a JSON dataset and select Array of Objects as File pattern.
Execute the pipeline and you will get the above array inside a file.
This is my Pipeline JSON:
"name": "pipeline1",
"properties": {
"activities": [
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"source": {
"type": "JsonSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
"formatSettings": {
"type": "JsonReadSettings"
"dataset": {
"referenceName": "Json1",
"type": "DatasetReference"
"firstRowOnly": false
"name": "Lookup output to Str",
"description": "",
"type": "SetVariable",
"dependsOn": [
"activity": "Lookup1",
"dependencyConditions": [
"userProperties": [],
"typeProperties": {
"variableName": "res_str",
"value": {
"value": "#substring(string(activity('Lookup1').output.value[0].body),2,sub(length(string(activity('Lookup1').output.value[0].body)),4))",
"type": "Expression"
"name": "Split Str to array",
"type": "SetVariable",
"dependsOn": [
"activity": "Lookup output to Str",
"dependencyConditions": [
"userProperties": [],
"typeProperties": {
"variableName": "split_arr",
"value": {
"value": "#split(variables('res_str'),'},\"')",
"type": "Expression"
"name": "build keys array using split array",
"type": "ForEach",
"dependsOn": [
"activity": "Split Str to array",
"dependencyConditions": [
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('split_arr')",
"type": "Expression"
"isSequential": true,
"activities": [
"name": "take first 36 chars of every item",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "keys_array",
"value": {
"value": "#take(item(), 36)",
"type": "Expression"
"name": "build final array using keys array",
"type": "ForEach",
"dependsOn": [
"activity": "build keys array using split array",
"dependencyConditions": [
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('keys_array')",
"type": "Expression"
"isSequential": true,
"activities": [
"name": "Append variable1",
"description": "append every object to array",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "json_arr",
"value": {
"value": "#activity('Lookup1').output.value[0].body[item()]",
"type": "Expression"
"name": "Just for Res show",
"type": "SetVariable",
"dependsOn": [
"activity": "build final array using keys array",
"dependencyConditions": [
"userProperties": [],
"typeProperties": {
"variableName": "final_res_show",
"value": {
"value": "#variables('json_arr')",
"type": "Expression"
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
"activity": "Just for Res show",
"dependencyConditions": [
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "DECLARE #json NVARCHAR(MAX)\nSET #json = \n N'#{variables('json_arr')}' \n \nSELECT * FROM \n OPENJSON ( #json ) \nWITH ( \n id varchar(200) '$.id' , \n name varchar(32) '$.name', \n date varchar(32) '$.date'\n )",
"queryTimeout": "02:00:00",
"partitionOption": "None"
"sink": {
"type": "JsonSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
"formatSettings": {
"type": "JsonWriteSettings",
"filePattern": "arrayOfObjects"
"enableStaging": false
"inputs": [
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
"outputs": [
"referenceName": "Target_JSON",
"type": "DatasetReference"
"variables": {
"res_str": {
"type": "String"
"split_arr": {
"type": "Array"
"keys_array": {
"type": "Array"
"final_res_show": {
"type": "Array"
"json_arr": {
"type": "Array"
"annotations": []
Result file:

JSON Schema: Check the array to validate if a certain block of JSON objects is contained in it

I have a JSON array of arbitrary length. Each item in the array is a nested block of JSON objects, they all have same properties but different values.
I need a JSON schema to check the array if the last block in the array has the values defined in the schema.
How should the scheme be defined so that it only considers the last block in the array and ignores all the blocks before in the array?
My current solution successfully validates the JSON objects if there is only one block in the array. As soon as I have more blocks, it fails because all the others are not valid against my schema - for sure, this corresponds to the expected behaviour.
In my example, the JSON array contains two nested blocks of JSON objects. These differ for the following items:
event.action = "[load|button]"
event.label = "[journey:device-only|submit,journey:device-only]"
type = "[page|track]"
An example for my data are:
"page": {
"path": "order/checkout/summary",
"language": "en"
"cart": {
"ordercase": "neworder",
"product_list": [
"name": "Apple iPhone 14 Plus",
"quantity": 1,
"price": 1000
"event": {
"action": "load",
"label": "journey:device-only"
"type": "page"
"page": {
"path": "order/checkout/summary",
"language": "en"
"cart": {
"ordercase": "neworder",
"product_list": [
"name": "Apple iPhone 14 Plus",
"quantity": 1,
"price": 1000
"event": {
"action": "button",
"label": "submit,journey:device-only",
"type": "track"
And the schema I use which works fine for the second block if the block would be the only one in the array:
"type": "array",
"$schema": "",
"items": {
"type": "object",
"required": ["event", "page", "type"],
"properties": {
"page": {
"type": "object",
"properties": {
"path": {
"const": "order/checkout/summary"
"language": {
"enum": ["de", "fr", "it", "en"]
"required": ["path", "language"]
"event": {
"type": "object",
"additionalProperties": false,
"properties": {
"action": {
"const": "button"
"label": {
"type": "string",
"pattern": "^[-_:, a-z0-9]*$",
"allOf": [
"type": "string",
"pattern": "^\\S*(?:(submit,|,submit))\\S*$"
"type": "string",
"pattern": "^\\S*(journey:(?:(device-only|device-plus)))\\S*$"
"required": ["action", "label"]
"type": {
"enum": ["track", "string"]

Arm Templates Copy Function - Skip Properties in loop if not present - Network Security Group - sourceAddressPrefix/sourceAddressPrefixes

I am trying to implement a copy function in a arm template used to deploy Network security group.
I have previously deployed templates using this format but due to Microsoft deciding to use two distinct names depending on if the Property is a single item or a list I am unable to use the copy function.
I have had to look into using If statements to ignore null parameters if present in a loop, which I have not been able to achieve.
So my question is how to go through a loop and ignore a specific Property if it not present in a loop.
The two Properties in question are
sourceAddressPrefix or sourceAddressPrefixes.
This is causing problems in the 2nd interation, I will get an error message
language expression property 'sourceAddressPrefixes' doesn't exist (if i switch the order of the paramater file, ie sourceAddressPrefixes is first, then the error message will point to 'sourceAddressPrefix'
Parameter file,
as you can see there are two secutiry rules, one set as sourceAddressPrefix, and the other sourceAddressPrefixes
"$schema": "",
"contentVersion": "",
"parameters": {
"location": {
"value": "westeurope"
"value": [
"name": "AllowSyncWithAzureAD",
"protocol": "Tcp",
"sourcePortRange": "*",
"destinationPortRange": "443",
"sourceAddressPrefix": "*",
"destinationAddressPrefix": "*",
"access": "Allow",
"priority": 101,
"direction": "Inbound"
"name": "AllowPSRemotingSliceP",
"protocol": "Tcp",
"sourcePortRange": "*",
"destinationPortRange": "5986",
"sourceAddressPrefixes": "[variables('PSRemotingSlicePIPAddresses')]",
"destinationAddressPrefix": "*",
"access": "Allow",
"priority": 301,
"direction": "Inbound"
in the Template file I have added both properties with if statements, but clearly I have not written them correctly, as the intended outcome is, if in a loop the property does not exist, ignore property.
"$schema": "",
"contentVersion": "",
"parameters": {
"location": {
"type": "string",
"defaultValue": "[resourceGroup().location]",
"metadata": {
"description": "Location for all resources."
"SecurityRule": {
"type": "array"
"variables": {
"domainServicesNSGName": "AGR01MP-NSGAADDS01",
"PSRemotingSlicePIPAddresses": [
"RDPIPAddresses": [
"PSRemotingSliceTIPAddresses": [
"resources": [
"apiVersion": "2018-10-01",
"type": "Microsoft.Network/networkSecurityGroups",
"name": "[variables('domainServicesNSGName')]",
"location": "[parameters('location')]",
"properties": {
"copy": [
"count": "[length(parameters('securityRule'))]",
"mode": "serial",
"input": {
"name": "[concat(parameters('securityRule')[copyIndex('securityRules')].name)]",
"properties": {
"protocol": "[concat(parameters('securityRule')[copyIndex('securityRules')].protocol)]",
"sourcePortRange": "[concat(parameters('securityRule')[copyIndex('securityRules')].sourcePortRange)]",
"destinationPortRange": "[concat(parameters('securityRule')[copyIndex('securityRules')].destinationPortRange)]",
"sourceAddressPrefixes": "[if(equals(parameters('securityRule')[copyIndex('securityRules')].sourceAddressPrefixes,''), json('null'), parameters('securityRule')[copyIndex('securityRules')].sourceAddressPrefixes)]",
"sourceAddressPrefix": "[if(equals(parameters('securityRule')[copyIndex('securityRules')].sourceAddressPrefix,''), json('null'), parameters('securityRule')[copyIndex('securityRules')].sourceAddressPrefix)]",
"destinationAddressPrefix": "[concat(parameters('securityRule')[copyIndex('securityRules')].destinationAddressPrefix)]",
"access": "[concat(parameters('securityRule')[copyIndex('securityRules')].access)]",
"priority": "[concat(parameters('securityRule')[copyIndex('securityRules')].priority)]",
"direction": "[concat(parameters('securityRule')[copyIndex('securityRules')].direction)]"
"outputs": {}
found solution
"sourceAddressPrefix": "[if(equals(parameters('SecurityRule')[copyIndex('securityRules')].name, 'SyncWithAzureAD'), parameters('SecurityRule')[copyIndex('securityRules')].sourceAddressPrefix, json('null'))]" ,
"sourceAddressPrefixes": "[if(contains(parameters('SecurityRule')[copyIndex('securityRules')].name, 'Allow'), parameters('SecurityRule')[copyIndex('securityRules')].sourceAddressPrefixes, json('null'))]" ,
the above code allows me to deploy to change to ignore a null value in an array. Though I had to change AllowSyncWithAzureAD to SyncWithAzureAD, in order for it not to be picked up by the 2nd line

Specify order of tables to copy using the Copy Data tool in Azure Data Factory to respect foreign keys

The Copy Data tool lets me select the tables I want to copy from the source to the destination, but the tables are copied in alphabetical order. Since I have foreign keys defined, this cannot work. I would like to manually change the order.
Unfortunately,as i know,ADF copy activity SQL DB connector only could transfer tables in the default order.It can't scan your constraint policy and execute in the optimal order.
So,i'm afraid that you have to figure out how your constraint policy set up and make the right order manually.After getting the table name sorting list,create the copy activity one by one for every table.
Surely,don't worry about this part.Every element could be created by ADF SDK or Powershell script.All you need to do is looping the list and pass it into snippet of code or script.Only the table name need to be changed per activity.
Here is a simple pipeline that copies data from one database to another - it has a ForEach with 2 activities inside it. It copies each table and then runs a stored procedure after each table. The column names in the tables are the same. It has a variable called tableMapping that is a json array that defines the 'from' and 'to' table names as they can be different. The ForEach has this setting "batchCount": 1, so that it runs one at a time. In my testing just now it processes the tables in the sequence in the variable tableMapping. If you don't specify the batchCount = 1 then it will run them in parallel.
"name": "TowWorks to Azure SQL DB",
"properties": {
"activities": [
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('tableMapping')",
"type": "Expression"
"batchCount": 1,
"activities": [
"name": "Copy from BI to Azure",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"source": {
"type": "SqlServerSource",
"queryTimeout": "02:00:00"
"sink": {
"type": "AzureSqlSink",
"preCopyScript": {
"value": "#{concat('truncate table raw.', item().to)}",
"type": "Expression"
"disableMetricsCollection": false
"enableStaging": false
"inputs": [
"referenceName": "BI_TW_Raw_NoTable",
"type": "DatasetReference",
"parameters": {
"tableName": {
"value": "#item().from",
"type": "Expression"
"outputs": [
"referenceName": "TW_Azure_DB_noTable",
"type": "DatasetReference",
"parameters": {
"schema": "raw",
"table": {
"value": "#item().to",
"type": "Expression"
"name": "Stored Procedure1",
"type": "SqlServerStoredProcedure",
"dependsOn": [
"activity": "Copy from BI to Azure",
"dependencyConditions": [
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
"userProperties": [],
"typeProperties": {
"storedProcedureName": {
"value": "#concat('raw.Update', item().to)",
"type": "Expression"
"linkedServiceName": {
"referenceName": "TowWorksAzureSqlDB",
"type": "LinkedServiceReference"
"variables": {
"tableMapping": {
"type": "Array",
"defaultValue": [
"from": "WORK_TASKS",
"to": "WorkTask"
"from": "Invoice",
"to": "Invoice"
"from": "ASSET",
"to": "Asset"
"from": "InvoiceDistribution",
"to": "InvoiceDistribution"
"from": "InvoiceEvent",
"to": "InvoiceEvent"
"from": "InvoiceTransaction",
"to": "InvoiceTransaction"
"to": "LogisticOrder"
"to": "LogisticOrderLineItem"
"from": "WORK_ORDERS",
"to": "WorkOrder"
"to": "WorkOrderStatus"
"from": {
"type": "String"
"to": {
"type": "String"
"folder": {
"name": "TowWorks"
"annotations": []

How to loop through an array in a logic app?

I have managed to get all my userdata in an array (see here) but now I cannot loop through the data. After building the array I have converted it to JSON, but I can no longer address the fields as defined in my JSON schema.
The only thing I can address in my loop (I use the JSON body as input for the For Each loop) is the body itself, not the individual fields like username, mail address etc.
Should I change something in my JSON schema to overcome this or is something else wrong?
Edit: Please find my JSON schema below:
"$schema": "",
"items": [
"properties": {
"##odata.type": {
"type": "string"
"createdDateTime": {
"type": "string"
"employeeId": {
"type": "string"
"givenName": {
"type": "string"
"id": {
"type": "string"
"mail": {
"type": "string"
"onPremisesSamAccountName": {
"type": "string"
"surname": {
"type": "string"
"userPrincipalName": {
"type": "string"
"required": [
"type": "object"
"type": "array"
Please see the image for how the JSON looks:
Per my understanding, you just want to loop your array to get each item's name, mail and some other fields. As you mentioned in your question, you can use the json body as input for the For Each loop. It's ok, ther is not need to to anything more. Please refer to the screenshot below:
Initialize a variable like your json data.
Then parse it by "Parse JSON" action.
Now, set the body as input for the For each loop, and then use a variable and set the value with "mail" from "Parse JSON".
After running the logic app, we can see the mail field is also looped. You can use the "mail", "name" and other fields easily in your "For each".
I checked your json schema, but it seems can't match the json data you provided in your screenshot. May I know how did you generate your json schema, in my side I generate the json schema just by clicking the "Use sample payload to generate schema" button and it will generate the schema automatically.
I use a json data sample with the same structure of yours' and generate its schema, please refer to the json data and schema below:
json data:
"body": [
"#odata.type": "test",
"id": "123456",
"givenName": "test",
"username": "test",
"userPrincipalName": "test",
"mail": "",
"onPremisesSamAccountName": "test",
"employeeId": "test",
"createdDateTime": "testdate"
"#odata.type": "test",
"id": "123456",
"givenName": "test",
"username": "test",
"userPrincipalName": "test",
"mail": "",
"onPremisesSamAccountName": "test",
"employeeId": "test",
"createdDateTime": "testdate"
"type": "object",
"properties": {
"body": {
"type": "array",
"items": {
"type": "object",
"properties": {
"##odata.type": {
"type": "string"
"id": {
"type": "string"
"givenName": {
"type": "string"
"username": {
"type": "string"
"userPrincipalName": {
"type": "string"
"mail": {
"type": "string"
"onPremisesSamAccountName": {
"type": "string"
"employeeId": {
"type": "string"
"createdDateTime": {
"type": "string"
"required": [
