Flatten complex array from Facebook using Azure data factory - arrays

Using the REST connector in Azure Data Factory, I am trying to fetch the Facebook campaign details.
In the pipeline, I have a web activity followed by copy activity. In the mapping section, I can see only the three columns (Id, name, status) from the first array and not getting those columns listed inside the second array.
graph.facebook.com
data factory mapping
Is there a way to get the columns listed inside the array? I also tried creating a data flow taking the Json file as source and then used the flatten transformation, still I cannot see the columns related to campaigns. Any help is appreciated. Thanks again.

I tested and find that Data Factory will consider the first object/JSON array as the JSON schema.
If you can adjust the JSON data, then the "insights" can be recognized:
Schema:
If you can't, then the "insights" column will be missed:
In this case, there isn't a way to get all the columns listed inside the array.
HTH.

#Leon Yue, I found a way to do that.
Step 1: 1. Copy Facebook campaign data using the REST connector and save as JSON in Azure Blob.
Step 1: copy activity to extract FB data as JSON and save in Blob
Step 2: 2. Create a Data flow, considering the JSON file from Blob as the source.
Step 2: Data flow task
Step 3: Create a JSON schema and save it in your desktop with the insights array in the first row(which has all the column values) As per your previous comments, I created the JSON schema such that ADF will consider the first object/ JSON array as the JSON schema.
Step 4: In the Data flow - Source dataset, map the JSON schema using the 'Import schema' option from sample file.
Step 4: Import schema
Now you will be able to see all the columns from the array.
All columns
Flatten JSON

Related

unable to load XML data using Copy data activity

I am unable to load the XML data using copy data activity in sql server DB, but am able to achieve this with data flows using flatten hierarchy , while mapping the corresponding array is not coming properly in copy data even pipeline success also only partial data is loading in DB.
and auto creation of table is also not allowing while doing copy activity for XML file , has to create table script first and load the data ...
as we are using SHIR this activity should be done in using copy data activity.
Use the collection reference in mapping tab of copy activity to unroll by and extract the data. I repro'd this using copy activity with sample nested XML data.
img1: source dataset preview.
In mapping tab, Select Import schemas
Toggle on the Advanced editor
Give the json path of the array from which data needs to be iterated and extracted.
img:2 Mapping settings.
When pipeline is run, data is copied successfully to database.
img:3 sink data after copying.
Reference : MS document on hierarchical source to tabular sink.

How to modify the projection of a dataset in a ADF Dataflow

I want to optimize my dataflow reading just data I really need.
I created a dataset that maps a view on my database. This dataset is used by different dataflow so I need a generic projection.
Now I am creating a new dataflow and I want to read just a subset of the dataset.
Here how I created the dataset:
And that is the generic projection:
Here how I created the data flow. That is the source settings:
But now I want just a subset of my dataset:
It works but I think I am doing wrong:
I wanto to read data from my dataset (as you can see from source settings tab), but when I modify the projection I read from the underlying table (as you can see from source option). It seems an inconsistence. Which is the correct way to manage this kind of customization?
Thank you
EDIT
The solution proposed does not solve my problem. If I go in monitor and I analyze the exections that is what I saw...
Before I had applyed the solution proposed and with the solution I wrote above I got this:
As you can see I had read just 8 columns from database.
With the solution proposed, I get this:
And just then:
Just to be clear, the purpose of my question is:
How ca n I read only the data I really need instead of read all data and filter them in second moment?
I found a way (explained in my question) but there is an inconsistency with the configuration of the dataflow (I set a dataflow as input but in the option I write a query that read from db).
First import data as a Source.
You can use Select transformation in DataFlow activity to select CustomerID from imported dataset.
Here you can remove unwanted columns.
Refer - https://learn.microsoft.com/en-us/azure/data-factory/data-flow-select

Parse DynamoDB List DataType

I am building an angular 8 application and am storing JSON data in a list data type in DynamoDB. I can insert the records just fine and can query the table for the data but I'm having issues grabbing the data in the list data type.
Here is how it looks in a console log
I don't have any issues grabbing the String data values, only the nested data in the List data type.
If your issue is related parsing the objects returned from dynamo you can use the DynamoDB Converter
https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/Converter.html#unmarshall-property
this will convert the returned dynamo record into a json record.
Also if you're using the sdk, consider also using the https://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/DynamoDB/DocumentClient.html where it will automatically convert dynamo records into json records.

Mule Salesforce connector in query operation. List of Maps as input

I have a huge list of Ids and Names for Contacts defined in this simple DW script. Let's show just two for simplicity:
[
{id:'169057',Name:'John'},
{id:'169099',Name:'Mark'}
]
I need salesforce connector to execute a single query (not 1000 unitary queries) about this two fields.
I expect the connector to be able to use a List of Maps as input, as it does using update/insert operations. So, I tried with this Connector config:
I am getting as response a ConsumerIterator. If I do a Object to String transformer but I am getting an empty String as response.
I guess must be a way of executing a big query in just one API call... But I am not finding it. Have in mind I need to use two or more WHERE clauses.
Any idea? Thanks!
I think you need to create two separate list for id and Name values from your input and then use those list in the query using IN.
Construct id list to be used in where clause. This can be executed inside for each with batch size 1000.
Retrieve name and id from contact for each batch
select id, name from contact where id in (id list)
Get the desired id's by matching name retrieved in step 2 within mule.

Querying Twitter JSON File in HBase

I have successfully downloaded twitter data through flume directly into HBase table containing one column family and all of the data is stored in one column like this
hbase(main):005:0> scan 'tweet'
ROW
default00fbf898-6f6e-4b41-aee8-646efadfba46
COLUMN+CELL
column=data:pCol, timestamp=1454394077534, value={"extended_entities":{"media":[{"display_url":"pic.twitter.com/a7Mjq2daKZ","source_user_id":2987221847,"type":"photo"....
Now i want to access structs and arrays through HBase like we can access then in Hive. I have tried googling the issue but still clue less. Kindly Help
You can't query display_url , source_user_id or another json fields in hbase directly. You should use a document store nosql db like mongodb.

Resources