I'm looking to convert JSON with an array to csv format. The number of elements inside the array is dynamic for each row. I tried using this flow, ( attached the flow file xml on the post ).
GetFile --> ConvertRecord --> UpdateAttribute --> PutFile
Are there any other alternatives?
JSON format:
{
"LogData":{
"Location":"APAC",
"product":"w1"
},
"Outcome":[
{
"limit":"0",
"pri":"3",
"result":"pass"
},
{
"limit":"1",
"pri":"2",
"result":"pass"
},
{
"limit":"5",
"priority":"1",
"result":"fail"
}
],
"attr":{
"vers":"1",
"datetime":"2018-01-10 00:36:00"
}
}
Expected output in csv:
location, product, limit, pri, result, vers, datetime
APAC w1 0 3 pass 1 2018-01-10 00:36:00
APAC w1 1 2 pass 1 2018-01-10 00:36:00
APAC w1 5 1 fail 1 2018-01-10 00:36:00
Output from the attached flow:
LogData,Outcome,attr
"MapRecord[{product=w1, Location=APAC}]","[MapRecord[{limit=0, result=pass, pri=3}], MapRecord[{limit=1, result=pass, pri=2}], MapRecord[{limit=5, result=fail}]]","MapRecord[{datetime=2018-01-10 00:36:00, vers=1}]"
ConvertRecord Config:
JSONTReeReader Controller service Config:
CSVRecordSetWriter Controller service Config:
Avroschmeregistry Contoller service config:
Avro schema:
{ "name": "myschema", "type": "record", "namespace": "myschema", "fields": [{"name": "LogData","type": { "name": "LogData", "type": "record", "fields": [{ "name": "Location", "type": "string"},{ "name": "product", "type": "string"} ]}},{"name": "Outcome","type": { "type": "array", "items": {"name": "Outcome_record","type": "record","fields": [ {"name": "limit","type": "string" }, {"name": "pri","type": ["string","null"] }, {"name": "result","type": "string" }] }}},{"name": "attr","type": { "name": "attr", "type": "record", "fields": [{ "name": "vers", "type": "string"},{ "name": "datetime", "type": "string"} ]}} ]}
Related
I have a JSON array of arbitrary length. Each item in the array is a nested block of JSON objects, they all have same properties but different values.
I need a JSON schema to check the array if the last block in the array has the values defined in the schema.
How should the scheme be defined so that it only considers the last block in the array and ignores all the blocks before in the array?
My current solution successfully validates the JSON objects if there is only one block in the array. As soon as I have more blocks, it fails because all the others are not valid against my schema - for sure, this corresponds to the expected behaviour.
In my example, the JSON array contains two nested blocks of JSON objects. These differ for the following items:
event.action = "[load|button]"
event.label = "[journey:device-only|submit,journey:device-only]"
type = "[page|track]"
An example for my data are:
[
{
"page": {
"path": "order/checkout/summary",
"language": "en"
},
"cart": {
"ordercase": "neworder",
"product_list": [
{
"name": "Apple iPhone 14 Plus",
"quantity": 1,
"price": 1000
}
]
},
"event": {
"action": "load",
"label": "journey:device-only"
},
"type": "page"
},
{
"page": {
"path": "order/checkout/summary",
"language": "en"
},
"cart": {
"ordercase": "neworder",
"product_list": [
{
"name": "Apple iPhone 14 Plus",
"quantity": 1,
"price": 1000
}
]
},
"event": {
"action": "button",
"label": "submit,journey:device-only",
},
"type": "track"
}
]
And the schema I use which works fine for the second block if the block would be the only one in the array:
{
"type": "array",
"$schema": "http://json-schema.org/draft-07/schema#",
"items": {
"type": "object",
"required": ["event", "page", "type"],
"properties": {
"page": {
"type": "object",
"properties": {
"path": {
"const": "order/checkout/summary"
},
"language": {
"enum": ["de", "fr", "it", "en"]
}
},
"required": ["path", "language"]
},
"event": {
"type": "object",
"additionalProperties": false,
"properties": {
"action": {
"const": "button"
},
"label": {
"type": "string",
"pattern": "^[-_:, a-z0-9]*$",
"allOf": [
{
"type": "string",
"pattern": "^\\S*(?:(submit,|,submit))\\S*$"
},
{
"type": "string",
"pattern": "^\\S*(journey:(?:(device-only|device-plus)))\\S*$"
}
]
}
},
"required": ["action", "label"]
},
"type": {
"enum": ["track", "string"]
}
}
}
}
I am creating an indexer that takes a document, runs the KeyPhraseExtractionSkill and outputs it back to the index.
For many documents, this works out of the box. But for those records which are over 50,000, this does not work. OK, no problem; this is clearly stated in the docs.
What the docs suggest is so use the Text Split Skill. What I've done is use the Text Split skill, split the original document into pages, pass all pages to the KeyPhraseExtractionSkill. Then we need to merge them back, as we'd end up with an array of arrays of strings. Unfortunately, it seems that the Merge Skill does not accept an array of arrays, just an array.
https://i.imgur.com/dBD4qgb.png <- Link to the skillset hierarchy.
This is the error reported by Azure:
Required skill input was not of the expected type 'StringCollection'. Name: 'itemsToInsert', Source: '/document/content/pages/*/keyPhrases'. Expression language parsing issues:
What I want to achieve in the end of the day is to run the KeyPhraseExtractionSkill for text which is larger than 50,000 to add it back to the index eventually.
JSON for skillset
"#odata.context": "https://-----------.search.windows.net/$metadata#skillsets/$entity",
"#odata.etag": "\"0x8D957466A2C1E47\"",
"name": "devalbertcollectionfilesskillset2",
"description": null,
"skills": [
{
"#odata.type": "#Microsoft.Skills.Text.SplitSkill",
"name": "SplitSkill",
"description": null,
"context": "/document/content",
"defaultLanguageCode": "en",
"textSplitMode": "pages",
"maximumPageLength": 1000,
"inputs": [
{
"name": "text",
"source": "/document/content"
}
],
"outputs": [
{
"name": "textItems",
"targetName": "pages"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.EntityRecognitionSkill",
"name": "EntityRecognitionSkill",
"description": null,
"context": "/document/content/pages/*",
"categories": [
"person",
"quantity",
"organization",
"url",
"email",
"location",
"datetime"
],
"defaultLanguageCode": "en",
"minimumPrecision": null,
"includeTypelessEntities": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "persons",
"targetName": "people"
},
{
"name": "organizations",
"targetName": "organizations"
},
{
"name": "entities",
"targetName": "entities"
},
{
"name": "locations",
"targetName": "locations"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.KeyPhraseExtractionSkill",
"name": "KeyPhraseExtractionSkill",
"description": null,
"context": "/document/content/pages/*",
"defaultLanguageCode": "en",
"maxKeyPhraseCount": null,
"modelVersion": null,
"inputs": [
{
"name": "text",
"source": "/document/content/pages/*"
}
],
"outputs": [
{
"name": "keyPhrases",
"targetName": "keyPhrases"
}
]
},
{
"#odata.type": "#Microsoft.Skills.Text.MergeSkill",
"name": "Merge Skill - keyPhrases",
"description": null,
"context": "/document",
"insertPreTag": " ",
"insertPostTag": " ",
"inputs": [
{
"name": "itemsToInsert",
"source": "/document/content/pages/*/keyPhrases"
}
],
"outputs": [
{
"name": "mergedText",
"targetName": "keyPhrases"
}
]
}
],
"cognitiveServices": {
"#odata.type": "#Microsoft.Azure.Search.CognitiveServicesByKey",
"key": "------",
"description": "/subscriptions/13abe1c6-d700-4f8f-916a-8d3bc17bb41e/resourceGroups/mde-dev-rg/providers/Microsoft.CognitiveServices/accounts/mde-dev-cognitive"
},
"knowledgeStore": null,
"encryptionKey": null
}```
Please let me know if there is anything else that I can add to improve the question. Thanks!
[1]: https://i.stack.imgur.com/GNf7F.png
You don't have to merge the key phrase outputs to insert them to the index.
Assuming your index already has a field called mykeyphrases of type Collection(Edm.String), to populate it with the key phrase outputs, add this indexer output field mapping:
"outputFieldMappings": [
...
{
"sourceFieldName": "/document/content/pages/*/keyPhrases/*",
"targetFieldName": "mykeyphrases"
},
...
]
The /* at the end of sourceFieldName is important to flattening the array of arrays of strings. This will also work as the skill input if you want to pass an array of strings to another skill for other enrichments.
I have Following 2 Json in 2 rows.
{
"attributes": [{
"name": "text-1580797977710",
"value": "Nikesh Niroula"
}, {
"name": "email-1580797979166",
"value": "nikesh#gmail.com"
}]
}
{
"attributes": [{
"name": "text-1580797977720",
"value": "Denver"
}, {
"name": "text-1580797977723",
"value": "colarado"
},
{
"name": "text-1580797977727",
"value": "USA"
}
]
}
I need the above json to be aggregated into one single array by using postgresql, the expected result would be as below. I tried using json_agg but this will add a inner array inside the main array. There might be multiple json and not only 2.
{
"attributes": [{
"name": "text-1580797977710",
"value": "Nikesh Niroula"
}, {
"name": "email-1580797979166",
"value": "nikesh#gmail.com"
}, {
"name": "text-1580797977720",
"value": "Denver"
}, {
"name": "text-1580797977723",
"value": "colarado"
},
{
"name": "text-1580797977727",
"value": "USA"
}
]
}
You need to unnest them before you feed them to the agg function. If the table is "j" and the column is "x", then:
select jsonb_build_object('attributes',jsonb_agg(value))
from j, jsonb_array_elements(x->'attributes');
You can use
the concatenation operator (||) along with jsonb_array_elements jsonb_agg functions to solve your problem.
with t (id,
attr1,
attr2) as (
values (1,
'{
"attributes": [
{
"name": "one-1",
"value": "Nikesh Niroula"
}]}'::jsonb,
'{"attributes": [
{
"name": "one-2",
"value": "Nikesh Niroula"
}]}'::jsonb) ,
(1,
'{
"attributes": [
{
"name": "two-1",
"value": "Nikesh Niroula"
}]}'::jsonb,
'{"attributes": [
{
"name": "two-2",
"value": "Nikesh Niroula"
}]}'::jsonb) ,
(1,
'{
"attributes": [
{
"name": "three-1",
"value": "Nikesh Niroula"
}]}'::jsonb,
'{"attributes": [
{
"name": "three-2",
"value": "Nikesh Niroula"
}]}'::jsonb) ,
(2,
'{
"attributes": [
{
"name": "four-1",
"value": "Nikesh Niroula"
}]}'::jsonb,
'{"attributes": [
{
"name": "four-2",
"value": "Nikesh Niroula"
}]}'::jsonb) )
select
id ,
jsonb_agg(value)
from
t,
jsonb_array_elements(attr1->'attributes' ||(attr2->'attributes'))
group by
1;
Hey i am new to Avro Schema space, needed to convert Jason Array into Avro Schema.
Below Jason is kind of client which serviceName along-with enabler-
If Enabler is true means that particular service is taken by client
If Enabler is false means that particular service is not taken by client.
{
"clientName": "Haven",
"serviceDetailsList": [
{
"serviceName": "Service1",
"enabled": true
},
{
"serviceName": "Service2",
"enabled": true
},
{
"serviceName": "Service3",
"enabled": true
},
{
"serviceName": "Service4",
"enabled": false
},
{
"serviceName": "Service5",
"enabled": false
},
{
"serviceName": "Service6",
"enabled": true
}
]
}
I worked with below schema but not getting proper response.
"fields":[
{"name": "serviceName", "type": [ "Boolean", "false" ] , "aliases":[
"service1" ]
},
{"name": "serviceName", "type": [ "Boolean", "false" ] , "aliases":[
"service2" ]
}
]
Any help would be appreciated.
Thank you all of you,again i tried and able to get correct scheam. Correct Avro Schema is-
{
"name": "modelData",
"type": "record",
"namespace": "com.hi.model",
"fields": [
{
"name": "clientName",
"type": "string"
},
{
"name": "serviceDetailsList",
"type": {
"type": "array",
"items": {
"name": "serviceDetailsList_record",
"type": "record",
"fields": [
{
"name": "serviceName",
"type": "string"
},
{
"name": "enabled",
"type": "boolean"
}
]
}
}
}
]
}
I'm looking to convert JSON with an array to csv format. The number of elements inside the array is dynamic for each row. I tried using this flow, ( attached the flow file xml on the post ).
GetFile --> ConvertRecord --> UpdateAttribute --> PutFile
Are there any other alternatives?
JSON format:
{ "LogData": {
"Location": "APAC",
"product": "w1" }, "Outcome": [
{
"limit": "0",
"pri": "3",
"result": "pass"
},
{
"limit": "1",
"pri": "2",
"result": "pass"
},
{
"limit": "5",
"priority": "1",
"result": "fail"
} ], "attr": {
"vers": "1",
"datetime": "2018-01-10 00:36:00" }}
Expected output in csv:
location, product, limit, pri, result, vers, datetime
APAC w1 0 3 pass 1 2018-01-10 00:36:00
APAC w1 1 2 pass 1 2018-01-10 00:36:00
APAC w1 5 1 fail 1 2018-01-10 00:36:00
Output from the attached flow:
LogData,Outcome,attr
"MapRecord[{product=w1, Location=APAC}]","[MapRecord[{limit=0, result=pass, pri=3}], MapRecord[{limit=1, result=pass, pri=2}], MapRecord[{limit=5, result=fail}]]","MapRecord[{datetime=2018-01-10 00:36:00, vers=1}]"
ConvertRecord -- I am using JSONTreereader and CSVRecordSSetwriter configurations as below:
JSONTreereader Controler service config:
CSVRecordSetwriter controller service config:
AvroschemaRegistry Controller service config:
Avro schema :
{ "name": "myschema", "type": "record", "namespace": "myschema", "fields": [{"name": "LogData","type": { "name": "LogData", "type": "record", "fields": [{ "name": "Location", "type": "string"},{ "name": "product", "type": "string"} ]}},{"name": "Outcome","type": { "type": "array", "items": {"name": "Outcome_record","type": "record","fields": [ {"name": "limit","type": "string" }, {"name": "pri","type": ["string","null"] }, {"name": "result","type": "string" }] }}},{"name": "attr","type": { "name": "attr", "type": "record", "fields": [{ "name": "vers", "type": "string"},{ "name": "datetime", "type": "string"} ]}} ]}
Try this spec in JoltTransformJSON before ConvertRecord:
{
"operation": "shift",
"spec": {
"Outcome": {
"*": {
"#(3,LogData.Location)": "[#2].location",
"#(3,LogData.product)": "[#2].product",
"#(3,attr.vers)": "[#2].vers",
"#(3,attr.datetime)": "[#2].datetime",
"*": "[#2].&"
}
}
}
}
]```
Seems that you need to performa JoltTransform before convert to CSV, if not is not going to work.