ADF: API With Nested JSON Arrays to Flat File

ADF: API With Nested JSON Arrays to Flat File - arrays

In ADF I am calling an API source that's returning nested/complex JSON that I need to flatten out into csv files.
The Copy Activity wont work for me. It will only read the first record from the nested data.
I need to be able to call and then flatten the aliases array
Here is an example of the response from the API:
{
"items": [
{
"title_no": 12345,
"booking_xref_title_no": 45305,
"edi_no": "2495",
"title_global_id": "TTL-11",
"aliases": [
{
"source_name": "123A",
"title_alias_global_id": "ABC1234"
},
{
"source_name": "123B",
"title_alias_global_id": "ABC5678"
I need to get the following into my output csv:
source_name
title_global_id
123A
ABC1234
123B
ABC5678

Json output given in question is incomplete. So I added few braces to it.
JSON response should be as given below:
{
"items":[
{
"title_no":12345,
"booking_xref_title_no":45305,
"edi_no":"2495",
"title_global_id":"TTL-11",
"aliases":[
{
"source_name":"123A",
"title_alias_global_id":"ABC1234"
},
{
"source_name":"123B",
"title_alias_global_id":"ABC5678"
}
]
}
]
}
With above JSON response, you can flatten it and save output to CSV file as shown in below steps.
Use JSON response as Source to DataFlow Activity in Azure data factory.
Select document form as Single document.
Now select Flatten transformation. Select Unroll by = items.aliases and do the mapping as shown in below screenshot.
Data preview after Flatten transformation.
Now add CSV file as Sink.
Output:

Related

Parsing an array of JSON objects from logfile in Logstash

I have logs in the following type of format:
2021-10-12 14:41:23,903716 [{"Name":"A","Dimen":[{"Name":"in","Value":"348"},{"Name":"ses","Value":"asfju"}]},{"Name":"read","A":[{"Name":"ins","Value":"348"},{"Name":"ses","Value":"asf5u"}]}]
2021-10-12 14:41:23,903716 [{"Name":"B","Dimen":[{"Name":"in","Value":"348"},{"Name":"ses","Value":"a7hju"}]},{"Name":"read","B":[{"Name":"ins","Value":"348"},{"Name":"ses","Value":"ashju"}]}]
Each log on a new line. Problem is I want each object from the single line in the top level array to be a separate document and parsed accordingly.
I need to parse this and send it to Elasticsearch. I have tried a number of filters, grok, JSON, split etc and I cannot get it to work the way I need to and I have little experience with these filters so if anyone can help it would be much appreciated.
The JSON codec is what I would need if I can remove the Text/timestamp from the file.
"If the data being sent is a JSON array at its root multiple events will be created (one per element)"
If there is a way to do that, this would also be helpful

This is the config example for your usecase:
input { stdin {} }
filter {
grok {
match => { "message" => "%{DATA:date},%{DATA:some_field} %{GREEDYDATA:json_message}" }
}
#Use the json plugin to translate raw to json
json { source => "json_message" target => "json" }
#and split the result to dedicated raws
split { field => "json" }
}
output {
stdout {
codec => rubydebug
}
}
If you need to parse the start of the log as date, you can use the grok with the date format or connect two fields and set than as source to the date plugin.

Iterating a JSON array in LogicApps

I'm using a LogicApp triggered by an HTTP call. The call posts a JSON message which is a single row array. I simply want to extract the single JSON object out of the array so that I can parse it but have spent several hours googling and trying various options to no avail. Here's an example of the array:
[{
"id": "866ef906-5bd8-44d8-af34-0c6906d2dfd7",
"subject": "Engagement-866ef906-5bd8-44d8-af34-0c6906d2dfd7",
"data": {
"$meta": {
"traceparent": "00-dccfde4923181d4196f870385d99cb84-52b8333f100b844c-00"
},
"timestamp": "2021-10-19T17:01:06.334Z",
"correlationId": "866ef906-5bd8-44d8-af34-0c6906d2dfd7",
"fileName": "show.xlsx"
},
"eventType": "File.Uploaded",
"eventTime": "2021-10-19T17:01:07.111Z",
"metadataVersion": "1",
"dataVersion": "1"
}]
Examples of what hasn't worked:
Parse JSON on the array, error: InvalidTemplate when specifiying an array as the schema
For each directly against the http output, error: No dependent actions succeeded.
Any suggestions would be gratefully received.

You have to paste the example that you have provided to 'Use sample payload to generate schema' in the Parse JSON Connector and then you will be able to retrieve each individual object from the sample payload.

You can extract a single JSON object from your array by using its index in square brackets. E.g., in the example below you'd need to use triggerBody()?[0] instead of triggerBody(). 0 is an index of the first element in the array, 1 - of the second, and so on.
Result:

Read JSON from rest API as is with Azure Data Factory

I'm trying to get Azure Data Factory to read my REST API and put it in SQL Server. The source is a REST API and the sink is a SQL Server table.
I tried to do something like:
"translator": {
"type": "TabularTranslator",
"schemaMapping": {
"$": "json"
},
"collectionReference": "$.tickets"
}
The source looks like:
{ "tickets": [ {... }, {...} ] }
Because of the poor mapping capabilities I'm choosing this path. I'll then split the data with a query. Preferbly I'd like to store each object inside tickets as a row with JSON of that object.
In short, how can I get the JSON output from the RestSource to a SqlSink single column text/nvarchar(max) column?

I managed to solve the same issue by modifying mapping manually.
ADF anyway tries to parse json, but from the Advanced mode you can edit json paths. Ex., this is the original schema parsed automatically by ADF
https://imgur.com/Y7QhcDI
Once opened in Advanced mode it will show full paths by adding indexes of the elements, something similar to $tickets[0][] etc
Try to delete all other columns and keep the only one $tickets (the highest level one), in my case it was $value https://i.stack.imgur.com/WnAzC.jpg. As the result the entire json will be written into the destination column.
If there are pagination rules in place, each page will be written as a single row.

json array of object is useless (not efficient)

I use JSON for sending data through an API for clients.
My data is a JSON array of objects, each
object in the array has the same type,
and the keys value are the same for all.
70% of a request is consumed by repeating useless key names.
Is there way to send data without this overhead?
"I know some way exists like csv but I want to choose general solution for this problem"
for example my array in json 5Mb and in csv its only 500kb
A simple json array
var people = [
{ firstname:"Micro", hasSocialNetworkSite: false, lastname:"Soft", site:"http://microsoft.com" },
{ firstname:"Face", hasSocialNetworkSite: true, lastname:"Book", site:"http://facebook.com" },
{ firstname:"Go", hasSocialNetworkSite: true, lastname:"ogle", site:"http://google.com" },
{ firstname:"Twit", hasSocialNetworkSite: true, lastname:"Ter", site:"http://twitter.com" },
{ firstname:"App", hasSocialNetworkSite: false, lastname:"Le", site:"http://apple.com" },
];
and this above array in csv format
"firstname","hasSocialNetworkSite","lastname","site"
"Micro","False","Soft","http://microsoft.com"
"Face","True","Book","http://facebook.com"
"Go","True","ogle","http://google.com"
"Twit","True","Ter","http://twitter.com"
"App","False","Le","http://apple.com"
you can see that the performance of json array of object in example.

Why would using a csv file not be a 'general solution'?
If your data is tabular you don't really need a hierachical format like json or xml.
You can even shrink your csv file further by removing the double quotes (those are only needed when there is a separator inside the field):
firstname,hasSocialNetworkSite,lastname,site
Micro,False,Soft,http://microsoft.com
Face,True,Book,http://facebook.com
Go,True,ogle,http://google.com
Twit,True,Ter,http://twitter.com
App,False,Le,http://apple.com

Angular alphabetizes GET response

I am currently trying to make an interactive table in Angular that reflects table information from a SQL database.
The stack I am using is MSSQL, Express.js, and AngularJS. When I log the response in Node, the data is in the desired order. However, when I log the data from .success(function(data)), the fields are alphabetized and the rows are put in random order).
I am sending a JSON object (an array of rows EX. {"b":"blah","a":"aye"}). However the row is received in Angular as {"a":"aye","b":"blah"}.
Desired affect -> Use column and row ordering from SQL query in client view. Remove "magic" angular is using to order information.

In Javascript, the properties of an object do not have guaranteed order. You need to send a JSON array instead:
["blah", "aye"]
If you need the column names as well you can send down an array of objects:
[{ "col":"b", "value":"blah" }, { "col":"a", "value":"aye" }]
Or alternatively, an object of arrays:
{ "col": ["b", "a"], "value": ["blah", "aye"] }
Edit: After some more thought, you're ideal JSON structure would probably look like this:
{
"col": ["b","a"],
"row": [
["blah","aye"],
["second","row"],
["and","so on"]
]
}
Now instead of getting "blah" from accessing table[0]['b'] like you would've before, you'll need to do something like table.row[0][table.col.indexOf('b')]

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

ADF: API With Nested JSON Arrays to Flat File - arrays

Related

Parsing an array of JSON objects from logfile in Logstash

Iterating a JSON array in LogicApps

Read JSON from rest API as is with Azure Data Factory

json array of object is useless (not efficient)

Angular alphabetizes GET response

Categories

Resources