Azure Data Factory - skipped rows with empty collection reference when flattening JSON - arrays

I am trying to serialize the following JSON to Parquet using ADF Copy Activity:
[
{
"mac": "06A1E8A75834",
"timestamp": "2020-11-06T00:00:00+02:00",
"floor_number": 2,
"x": 300.00,
"y": 350.00,
"located_inside": true,
"zones": [
{
"zone_map_name": "sections",
"zone_name": "z_1_"
}
]
},
{
"mac": "06A1E8A75835",
"timestamp": "2020-11-06T00:00:00+02:00",
"floor_number": 2,
"x": 300.00,
"y": 300.00,
"located_inside": true,
"zones": []
}
]
However in some cases the zones array is empty and the rest of the record is skipped.
The expected behavior is when the array is empty - the zone_name and zone_map_name are empty.
The mapping looks like this:
Is there a way to tell ADF how to treat empty collection reference as empty values in rows?

As we know, ADF will skipped rows with empty collection reference when flattening JSON.
When we uncheck that box.
ADF will copy all the items to destination, like this:
[{"mac":"06A1E8A75834","timestamp":"2020-11-06T00:00:00+02:00","floor_number":2,"x":300.0,"y":350.0,"located_inside":true,"zone_map_name":"sections","zone_name":"z_1_"}
,{"mac":"06A1E8A75835","timestamp":"2020-11-06T00:00:00+02:00","floor_number":2,"x":300.0,"y":300.0,"located_inside":true}
]

Related

Encapsulate a JSON Array inside an object with JOLT?

I work on a project where the output of one of our APIs is a JSON array. I'd like to encapsulate this array inside an object.
I try to use a JOLT transformation (this is the first time I use this tool) to achieve this. I've already searched through a lot of example, but I still can't figure out what my JOLT specification has to be to perform the transformation. I can't find what I am looking for.
For example, if my input is like this:
[
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
}
]
I'd like the output to be:
{
"list":
[
{
"id": 1,
"name": "foo"
},
{
"id": 2,
"name": "bar"
}
]
}
In short, I just want to put my array inside a field of another object.
You can use a shift transformation spec such as
[
{
"operation": "shift",
"spec": {
"*": "list[]"
}
}
]
where "*" wildcard represents indices of the current wrapper array of objects
the demo on the site http://jolt-demo.appspot.com/ is

Reading data from MongoDB that contains array using Talend

I have a collection in my MongoDB that contains one field that is an array.
Refer to the data above, the field 'Courses' is an array.
The JSON format of the data is like this:
{
"_id": {
"$oid": "60eb59b98a970a20865142e8"
},
"Name": "Sadia",
"Age": 24,
"Institute": "IBA",
"Courses": [{
"Name": "ITP",
"Grade": "A-"
}, {
"Name": "OOP",
"Grade": "A-"
}]
}
I am aware that there is a way in case its an object, but could not find a way on how to read this data using Talend since it contains an array.

MongoDB Array Query - Single out an array element

I am having trouble with querying a MongoDB collection with an array inside.
Here is the structure of my collection that I am querying. This is one record:
{
"_id": "abc123def4567890",
"profile_id": "abc123def4567890",
"image_count": 2,
"images": [
{
"image_id": "ABC123456789",
"image_url": "images/something.jpg",
"geo_loc": "-0.1234,11.234567890",
"title": "A Title",
"shot_time": "01:23:33",
"shot_date": "11/22/2222",
"shot_type": "scenery",
"conditions": "cloudy",
"iso": 16,
"f": 2.4,
"ss": "1/545",
"focal": 6.0,
"equipment": "",
"instructions": "",
"upload_date": 1234567890,
"update_date": 1234567890
},
{
"image_id": "ABC123456789",
"image_url": "images/something.jpg",
"geo_loc": "-0.1234,11.234567890",
"title": "A Title",
"shot_time": "01:23:33",
"shot_date": "11/22/2222",
"shot_type": "portrait",
"conditions": "cloudy",
"iso": "16",
"f": "2.4",
"ss": "1/545",
"focal": "6.0",
"equipment": "",
"instructions": "",
"upload_date": 1234567890,
"update_date": 1234567890
}
]
}
Forgive the formatting, I didn't know how else to show this.
As you can see, it's a profile with a series of images within an array called 'images' and there are 2 images. Each of the 'images' array items contain an object of attributes for the image (url, title, type, etc).
All I want to do is to return the object element whose attributes match certain criteria:
Select object from images which has shot_type = "scenery"
I tried to make it as simple as possible so i started with:
find( { "images.shot_type": "scenery" } )
This returns the entire record and both the images within. So I tried projection but I could not isolate the single object within the array (in this case object at position 0) and return it.
I think the answer lies with projection but I am unsure.
I have gone through the MongoDB documents for hours now and can't find inspiration. I have read about $elemMatch, $, and the other array operators, nothing seems to allow you to single out an array item based on data within. I have been through this page too https://docs.mongodb.com/manual/tutorial/query-arrays/ Still can't work it out.
Can anyone provide help?
Have I made an error by using '$push' to populate my images field (making it an array) instead of using '$set' which would have made it into an embedded document? Would this have made a difference?
Using aggregation:
db.collection.aggregate({
$project: {
_id: 0,
"result": {
$filter: {
input: "$images",
as: "img",
cond: {
$eq: [
"$$img.shot_type",
"scenery"
]
}
}
}
}
})
Playground
You can use $elemMatch in this way (simplified query):
db.collection.find({
"profile_id": "1",
},
{
"images": {
"$elemMatch": {
"shot_type": 1
}
}
})
You can use two objects into find query. The first will filter all document and will only get those whose profile_id is 1. You can omit this stage and use only { } if you wnat to search into the entire collection.
Then, the other object uses $elemMatch to get only the element whose shot_type is 1.
Check an example here

JSON Schema Array must contain a specific string

There are several question on the subject, but none of them seem to address this particular issue nor does the documentation on JSON Schema, so maybe it cannot be done.
The issue is that I have an array that can have any of 4 strings as values, easy enough to achieve with this schema:
...
"attributes": {
"type": "array",
"items": {
"type": "string",
"enum": [
"controls",
"autoplay",
"muted",
"loop"
]
},
"additionalItems": false
}
...
So the values in the array can only be one of those four. Nevertheless, "controls" must always be part of the array, while the other three are optional. If it was an array of objects we could make this required, but I'm not sure how to check for an array having a specific value.
Thanks for any help!
You can use the contains keyword:
"attributes": {
"type": "array",
"items": {
"type": "string",
"enum": [
"controls",
"autoplay",
"muted",
"loop"
]
},
"contains": {
"const": "controls"
},
"additionalItems": false
}
From the specification:
6.4.6. contains
The value of this keyword MUST be a valid JSON Schema.
An array instance is valid against "contains" if at least one of its
elements is valid against the given schema.

Mapping an array inside an array with a JSON Reader

My JSON looks as follows:
{
"records": [
{
"_id": "5106f97bdcb713b818d7f1f1",
"cn": "lsacco",
"favorites": [
{
"fullName": "Friend One",
"uid": "friend1"
},
{
"fullName": "Friend Two",
"uid": "friend2"
}
]
}
]
}
When I try to use records.favorites as the root for my JSON reader, I do not get any results populated to my model. Is there a way to do this without having to resort to using an association? Note that in my case, records will only have one element despite it showing an array.
records.favorites isn't valid because the property doesn't exist.
You want:
records[0].favorites
records has been declared as an array so records.favorites will point to nothing in the json data file.
using the index in records should solve the problem.

Resources