Convert Json array into csv using apache Nifi - arrays

I'm looking to convert JSON with an array to csv format. The number of elements inside the array is dynamic for each row. I tried using this flow, ( attached the flow file xml on the post ).
GetFile --> ConvertRecord --> UpdateAttribute --> PutFile
Are there any other alternatives?
JSON format:
{ "LogData": {
"Location": "APAC",
"product": "w1" }, "Outcome": [
{
"limit": "0",
"pri": "3",
"result": "pass"
},
{
"limit": "1",
"pri": "2",
"result": "pass"
},
{
"limit": "5",
"priority": "1",
"result": "fail"
} ], "attr": {
"vers": "1",
"datetime": "2018-01-10 00:36:00" }}
Expected output in csv:
location, product, limit, pri, result, vers, datetime
APAC w1 0 3 pass 1 2018-01-10 00:36:00
APAC w1 1 2 pass 1 2018-01-10 00:36:00
APAC w1 5 1 fail 1 2018-01-10 00:36:00
Output from the attached flow:
LogData,Outcome,attr
"MapRecord[{product=w1, Location=APAC}]","[MapRecord[{limit=0, result=pass, pri=3}], MapRecord[{limit=1, result=pass, pri=2}], MapRecord[{limit=5, result=fail}]]","MapRecord[{datetime=2018-01-10 00:36:00, vers=1}]"
ConvertRecord -- I am using JSONTreereader and CSVRecordSSetwriter configurations as below:
JSONTreereader Controler service config:
CSVRecordSetwriter controller service config:
AvroschemaRegistry Controller service config:
Avro schema :
{ "name": "myschema", "type": "record", "namespace": "myschema", "fields": [{"name": "LogData","type": { "name": "LogData", "type": "record", "fields": [{ "name": "Location", "type": "string"},{ "name": "product", "type": "string"} ]}},{"name": "Outcome","type": { "type": "array", "items": {"name": "Outcome_record","type": "record","fields": [ {"name": "limit","type": "string" }, {"name": "pri","type": ["string","null"] }, {"name": "result","type": "string" }] }}},{"name": "attr","type": { "name": "attr", "type": "record", "fields": [{ "name": "vers", "type": "string"},{ "name": "datetime", "type": "string"} ]}} ]}

Try this spec in JoltTransformJSON before ConvertRecord:
{
"operation": "shift",
"spec": {
"Outcome": {
"*": {
"#(3,LogData.Location)": "[#2].location",
"#(3,LogData.product)": "[#2].product",
"#(3,attr.vers)": "[#2].vers",
"#(3,attr.datetime)": "[#2].datetime",
"*": "[#2].&"
}
}
}
}
]```

Seems that you need to performa JoltTransform before convert to CSV, if not is not going to work.

Related

Azure Data Factory - How to transform object with dynamic keys to array in a data flow?

After spending many hours of reading the documentation, following some tutorials and trial & error, i just can't figure it out; how can I transform the following complex object with key objects to an array using a data flow in Azure Data Factory?
Input
{
"headers": {
"Content-Length": 1234
},
"body": {
"00b50a39-8591-3db3-88f7-635e2ec5c65a": {
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 1",
"date": "2023-02-09"
},
"0c206312-2348-391b-99f0-261323a94d95": {
"id": "0c206312-2348-391b-99f0-261323a94d95",
"name": "Example 2",
"date": "2023-02-09"
},
"0c82d1e4-a897-32f2-88db-6830a21b0a43": {
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 3",
"date": "2023-02-09"
},
}
}
Expected output
[
{
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 1",
"date": "2023-02-09"
},
{
"id": "0c206312-2348-391b-99f0-261323a94d95",
"name": "Example 2",
"date": "2023-02-09"
},
{
"id": "00b50a39-8591-3db3-88f7-635e2ec5c65a",
"name": "Example 3",
"date": "2023-02-09"
}
]
AFAIK, Your JSON keys are dynamic. So, getting the desired result using dataflow might not be possible.
In this case, you can try the below approach as a workaround. This will work only if all of your key's length is same.
This is my Pipeline:
First I have used a lookup activity to get the JSON file and converted the lookup output to a string and stored in a variable using below expression.
#substring(string(activity('Lookup1').output.value[0].body),2,sub(length(string(activity('Lookup1').output.value[0].body)),4)).
Then I have used split on that String variable with '},"' and stored in an array variable using below expression.
#split(variables('res_str'),'},"')
It will give the array like below.
Give that array to a ForEach and inside ForEach use an append variable activity to store the keys into an array with below expression.
#take(item(), 36)
Now, I got the list of keys in an array, after the above ForEach use another ForEach activity to get the desired array of objects. Use append variable actvity inside ForEach and give the below expression for it.
#activity('Lookup1').output.value[0].body[item()]
Result array after ForEach will be:
If you want to store the above JSON into a file, you need to use OPENJSON from SQL. This is because copy activity additonal column only supports string type not an array type.
Use a SQL dataset on copy activity source and give the below SQL script in the query.
DECLARE #json NVARCHAR(MAX)
SET #json =
N'#{variables('json_arr')}'
SELECT * FROM
OPENJSON ( #json )
WITH (
id varchar(200) '$.id' ,
name varchar(32) '$.name',
date varchar(32) '$.date'
)
In Sink, give a JSON dataset and select Array of Objects as File pattern.
Execute the pipeline and you will get the above array inside a file.
This is my Pipeline JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "JsonSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "JsonReadSettings"
}
},
"dataset": {
"referenceName": "Json1",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "Lookup output to Str",
"description": "",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "res_str",
"value": {
"value": "#substring(string(activity('Lookup1').output.value[0].body),2,sub(length(string(activity('Lookup1').output.value[0].body)),4))",
"type": "Expression"
}
}
},
{
"name": "Split Str to array",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Lookup output to Str",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "split_arr",
"value": {
"value": "#split(variables('res_str'),'},\"')",
"type": "Expression"
}
}
},
{
"name": "build keys array using split array",
"type": "ForEach",
"dependsOn": [
{
"activity": "Split Str to array",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('split_arr')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "take first 36 chars of every item",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "keys_array",
"value": {
"value": "#take(item(), 36)",
"type": "Expression"
}
}
}
]
}
},
{
"name": "build final array using keys array",
"type": "ForEach",
"dependsOn": [
{
"activity": "build keys array using split array",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('keys_array')",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append variable1",
"description": "append every object to array",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "json_arr",
"value": {
"value": "#activity('Lookup1').output.value[0].body[item()]",
"type": "Expression"
}
}
}
]
}
},
{
"name": "Just for Res show",
"type": "SetVariable",
"dependsOn": [
{
"activity": "build final array using keys array",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "final_res_show",
"value": {
"value": "#variables('json_arr')",
"type": "Expression"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Just for Res show",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "DECLARE #json NVARCHAR(MAX)\nSET #json = \n N'#{variables('json_arr')}' \n \nSELECT * FROM \n OPENJSON ( #json ) \nWITH ( \n id varchar(200) '$.id' , \n name varchar(32) '$.name', \n date varchar(32) '$.date'\n )",
"queryTimeout": "02:00:00",
"partitionOption": "None"
},
"sink": {
"type": "JsonSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "JsonWriteSettings",
"filePattern": "arrayOfObjects"
}
},
"enableStaging": false
},
"inputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "Target_JSON",
"type": "DatasetReference"
}
]
}
],
"variables": {
"res_str": {
"type": "String"
},
"split_arr": {
"type": "Array"
},
"keys_array": {
"type": "Array"
},
"final_res_show": {
"type": "Array"
},
"json_arr": {
"type": "Array"
}
},
"annotations": []
}
}
Result file:

fetch value from SharePoint JSON output

I am getting below JSON object from sharepoint list. how can I get the value for Company in the Data operation ( select) for logic apps. I did item()['Company']?['Value'] and it is not working. Any suggestions?
"body": [
{
"Company": {
"#odata.type": "#Microsoft.Azure.Connectors.SharePoint.SPListExpandedReference",
"Id": 0,
"Value": "Test1"
},
"Date From": "2022-03-30",
"Date To": "2022-03-31",
"Title": "Title 1"
},
{
"Company": {
"#odata.type": "#Microsoft.Azure.Connectors.SharePoint.SPListExpandedReference",
"Id": 2,
"Value": "Line2"
},
"Date From": "2022-03-21",
"Date To": "2022-03-29",
"Title": "Title 2"
}
]
}
I am fetching share-point list and then using data operations (select) to get the JSON as output.
I need JSON in the below format so that I can pass this to store proc and insert into the Azure SQL DB. I have another 12 items in the list.
[
{
"Company": "Test1",
"Date From": "2022-03-30",
"Date To": "2022-03-31",
"Title": "Title 1"
},
{
"Company": "Line2",
"Date From": "2022-03-21",
"Date To": "2022-03-29",
"Title": "Title 2"
}
]
Rather than select, you can set a variable. We're all different but that makes far more sense to me.
Your expression is much the same, I used ...
item()['Company']['Value']
Just make sure you initialise the variable outside and prior to the For each ...
This is the result for the first item in the array ...
To compile a full JSON object and add it to an array, again, simply use a variable and specify the values as need be.
Firstly, initialize your array outside of the For each ...
... and then in the For each, add an object to the array variable on each loop (make sure you include the quotes around the expression where required) ...
You just have to compile the JSON. The end result will look like this ...
This is the JSON in full ...
[
{
"Company": "Line2",
"Date From": "2022-03-21",
"Date To": "2022-03-29",
"Title": "Title 2"
},
{
"Company": "Test1",
"Date From": "2022-03-30",
"Date To": "2022-03-31",
"Title": "Title 1"
}
]
Also, you'll notice my list has come out in a different order, that's because the For each runs in parallel, if you need to avoid that, change the settings so it runs in a single thread ...
This is the JSON definition of the LogicApp, you can load it into your tenant and test with it ...
{
"definition": {
"$schema": "https://schema.management.azure.com/providers/Microsoft.Logic/schemas/2016-06-01/workflowdefinition.json#",
"actions": {
"For_each": {
"actions": {
"Append_to_SQL_Array": {
"inputs": {
"name": "SQL Array",
"value": {
"Company": "#{item()['Company']['Value']}",
"Date From": "#{item()['Date From']}",
"Date To": "#{item()['Date To']}",
"Title": "#{item()['Title']}"
}
},
"runAfter": {},
"type": "AppendToArrayVariable"
}
},
"foreach": "#variables('SharePoint JSON')",
"runAfter": {
"Initialize_SQL_Array": [
"Succeeded"
]
},
"runtimeConfiguration": {
"concurrency": {
"repetitions": 1
}
},
"type": "Foreach"
},
"Initialize_SQL_Array": {
"inputs": {
"variables": [
{
"name": "SQL Array",
"type": "array"
}
]
},
"runAfter": {
"Initialize_SharePoint_JSON": [
"Succeeded"
]
},
"type": "InitializeVariable"
},
"Initialize_SharePoint_JSON": {
"inputs": {
"variables": [
{
"name": "SharePoint JSON",
"type": "array",
"value": [
{
"Company": {
"Id": 0,
"Value": "Test1",
"odata.type": "#Microsoft.Azure.Connectors.SharePoint.SPListExpandedReference"
},
"Date From": "2022-03-30",
"Date To": "2022-03-31",
"Title": "Title 1"
},
{
"Company": {
"Id": 2,
"Value": "Line2",
"odata.type": "#Microsoft.Azure.Connectors.SharePoint.SPListExpandedReference"
},
"Date From": "2022-03-21",
"Date To": "2022-03-29",
"Title": "Title 2"
}
]
}
]
},
"runAfter": {},
"type": "InitializeVariable"
},
"Initialize_variable": {
"inputs": {
"variables": [
{
"name": "Result",
"type": "array",
"value": "#variables('SQL Array')"
}
]
},
"runAfter": {
"For_each": [
"Succeeded"
]
},
"type": "InitializeVariable"
}
},
"contentVersion": "1.0.0.0",
"outputs": {},
"parameters": {},
"triggers": {
"Recurrence": {
"evaluatedRecurrence": {
"frequency": "Month",
"interval": 12
},
"recurrence": {
"frequency": "Month",
"interval": 12
},
"type": "Recurrence"
}
}
},
"parameters": {}
}

How to convert Json Array to Avro Schema

Hey i am new to Avro Schema space, needed to convert Jason Array into Avro Schema.
Below Jason is kind of client which serviceName along-with enabler-
If Enabler is true means that particular service is taken by client
If Enabler is false means that particular service is not taken by client.
{
"clientName": "Haven",
"serviceDetailsList": [
{
"serviceName": "Service1",
"enabled": true
},
{
"serviceName": "Service2",
"enabled": true
},
{
"serviceName": "Service3",
"enabled": true
},
{
"serviceName": "Service4",
"enabled": false
},
{
"serviceName": "Service5",
"enabled": false
},
{
"serviceName": "Service6",
"enabled": true
}
]
}
I worked with below schema but not getting proper response.
"fields":[
{"name": "serviceName", "type": [ "Boolean", "false" ] , "aliases":[
"service1" ]
},
{"name": "serviceName", "type": [ "Boolean", "false" ] , "aliases":[
"service2" ]
}
]
Any help would be appreciated.
Thank you all of you,again i tried and able to get correct scheam. Correct Avro Schema is-
{
"name": "modelData",
"type": "record",
"namespace": "com.hi.model",
"fields": [
{
"name": "clientName",
"type": "string"
},
{
"name": "serviceDetailsList",
"type": {
"type": "array",
"items": {
"name": "serviceDetailsList_record",
"type": "record",
"fields": [
{
"name": "serviceName",
"type": "string"
},
{
"name": "enabled",
"type": "boolean"
}
]
}
}
}
]
}

Convert Json array into csv using Nifi

I'm looking to convert JSON with an array to csv format. The number of elements inside the array is dynamic for each row. I tried using this flow, ( attached the flow file xml on the post ).
GetFile --> ConvertRecord --> UpdateAttribute --> PutFile
Are there any other alternatives?
JSON format:
{
"LogData":{
"Location":"APAC",
"product":"w1"
},
"Outcome":[
{
"limit":"0",
"pri":"3",
"result":"pass"
},
{
"limit":"1",
"pri":"2",
"result":"pass"
},
{
"limit":"5",
"priority":"1",
"result":"fail"
}
],
"attr":{
"vers":"1",
"datetime":"2018-01-10 00:36:00"
}
}
Expected output in csv:
location, product, limit, pri, result, vers, datetime
APAC w1 0 3 pass 1 2018-01-10 00:36:00
APAC w1 1 2 pass 1 2018-01-10 00:36:00
APAC w1 5 1 fail 1 2018-01-10 00:36:00
Output from the attached flow:
LogData,Outcome,attr
"MapRecord[{product=w1, Location=APAC}]","[MapRecord[{limit=0, result=pass, pri=3}], MapRecord[{limit=1, result=pass, pri=2}], MapRecord[{limit=5, result=fail}]]","MapRecord[{datetime=2018-01-10 00:36:00, vers=1}]"
ConvertRecord Config:
JSONTReeReader Controller service Config:
CSVRecordSetWriter Controller service Config:
Avroschmeregistry Contoller service config:
Avro schema:
{ "name": "myschema", "type": "record", "namespace": "myschema", "fields": [{"name": "LogData","type": { "name": "LogData", "type": "record", "fields": [{ "name": "Location", "type": "string"},{ "name": "product", "type": "string"} ]}},{"name": "Outcome","type": { "type": "array", "items": {"name": "Outcome_record","type": "record","fields": [ {"name": "limit","type": "string" }, {"name": "pri","type": ["string","null"] }, {"name": "result","type": "string" }] }}},{"name": "attr","type": { "name": "attr", "type": "record", "fields": [{ "name": "vers", "type": "string"},{ "name": "datetime", "type": "string"} ]}} ]}

Elastic - JSON Array nested in Array

I have to index a json to Elastic which look like the below format. My problem is that the key "variable" is array that contains json objects (I thought about "nested" datatype of Elastic) but some of those objects it's possible to contain nested json arrays inside them. (see variable CUSTOMERS).
POST /example_data/data {
"process_name": "TEST_PROCESS",
"process_version ": 0,
"process_id": "1111",
"activity_id": "111",
"name": "update_data",
"username": "testUser",
"datetime": "2018-01-01 10:00:00",
"variables": [{
"name": "ΒΑΝΚ",
"data_type": "STRING",
"value": "EUROBANK"
},{
"name": "CITY",
"data_type": "STRING",
"value": "LONDON"
}, {
"name": "CUSTOMERS",
"data_type": "ENTITY",
"value": [{
"variables": [{
"name": "CUSTOMER_NAME",
"data_type": "STRING",
"value": "JOHN"
}, {
"name": " CUSTOMER_CITY",
"data_type": "STRING",
"value": "LONDON"
}
]
}
]
}, {
"name": "CUSTOMERS",
"data_type": "ENTITY",
"value": [{
"variables": [{
"name": "CUSTOMER_NAME",
"data_type": "STRING",
"value": "ΑΘΗΝΑ"
}, {
"name": " CUSTOMER_CITY ",
"data_type": "STRING",
"value": "LIVERPOOL"
}, {
"name": " CUSTOMER_NUMBER",
"data_type": "STRING",
"value": "1234567890"
}
]
}
]
}
] }
When I'm trying to index it I get the following error
{ "error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "Can't merge a non object mapping [variables.value] with an object mapping [variables.value]"
}
],
"type": "illegal_argument_exception",
"reason": "Can't merge a non object mapping [variables.value] with an object mapping [variables.value]" }, "status": 400 }
Mapping
{ "example_data": {
"mappings": {
"data": {
"properties": {
"activity_id": {
"type": "text"
},
"name": {
"type": "text"
},
"process_name": {
"type": "text"
},
"process_version": {
"type": "integer"
}
"process_id": {
"type": "text"
},
"datetime": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"username": {
"type": "text",
"analyzer": "greek"
},
"variables": {
"type": "nested",
"properties": {
"data_type": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256
}
}
}
}
}
}
}
}}}
When I remove the variable CUSTOMERS that contains the array, then It works properly because there are only json objects.
Is there a way to handle that? Thanks in advance

Resources