jolt array transform using wildcard - arrays

I'm using JOLT to transform data from:
[{"a" : "a",
"b" : "b",
"c" : "c",
...},
{"a" : "a",
"b" : "b",
"c" : "c",
...}]
To:
[{"a1" : "a",
"b1" : "b",
"c1" : "c",
...},
{"a1" : "a",
"b1" : "b",
"c1" : "c",
...}]
I'm trying to figure out a wild card that would map all the attributes I don't need to change. Something like:
[{
"operation": "shift",
"spec": {
"*": {
"a": "[&1].a1",
"b": "[&1].b1",
"c": "[&1].c1",
"*": {
"#": "&"
}
}
}
}]
Where:
"*": {
"#": "&"
}
Would work as a wildcard for all the fields I don't need to update.

Spec
[{
"operation": "shift",
"spec": {
"*": {
"a": "[&1].a1",
"b": "[&1].b1",
"c": "[&1].c1",
"*": "[&1].&"
}
}
}]

Related

How do I merge arrays from multiple documents without duplicates in MongoDB aggregation?

I have 3 documents:
{
"id": 1,
"user": "Brian1",
"configs": [
"a",
"b",
"c",
"d"
]
}
----
{
"id": 2,
"user": "max_en",
"configs": [
"a",
"h",
"i",
"j"
]
}
----
----
{
"id": 3,
"user": "userX",
"configs": [
"t",
"u",
"s",
"b"
]
}
I want to merge all the "configs" arrays into one array without dublicates,like this:
{
"configs": [
"a",
"b",
"c",
"d",
"h",
"i",
"j",
"t",
"u",
"s",
]
}
I've tried the following:
Aggregation.group("").addToSet("configs").as("configs") and { _id: "", 'configs': { $addToSet: '$configs' } }
The first one gives an error because I've left the fieldname empty (I don't know what to put there).
The second one returns a merged array but with duplicates.
When you want to group all the documents, you need to add {_id: null}
It means group all documents.
Probably you need this
db.collection.aggregate([
{
"$unwind": "$configs"
},
{
$group: {
_id: null,
configs: {
"$addToSet": "$configs"
}
}
}
])
But be cautious when you need to use on larger collection without a match.

jq - remove duplicate entries within all arrays inside the JSON file

I have the following JSON file.
[
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"A",
"B",
"B",
"C",
"C",
"C",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"AA",
"BA",
"BB",
"CC",
"CC",
"CC",
"CC",
"DD",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"1",
"1",
"2",
"3",
"3",
"4",
"4",
"4",
"0",
"0",
"0"
]
}
}
]
I need to remove duplicates inside every array in the JSON file. So the result should look like following
[
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"B",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"BA",
"BB",
"CC",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"1",
"2",
"3",
"4",
"0"
]
}
}
]
Array key names are not known in advance. There might be multiple arrays inside the Arrays object.
I tried to use unique_by but it requires the key name.
This algorithm - search for every array inside the Arrays object, for every such array apply unique function, re-assign results back to the array - should be fairly easy to implement, but I am stuck.
Thanks.
walk( if type == "array" then unique else . end)
If the original order should be respected, then you can easily use "def uniques" as defined at How do I get jq to return unique results when json has multiple identical entries?
you can use unique and |=:
$ jq '.[].Arrays[] |= unique' file.json
[
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"B",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"BA",
"BB",
"CC",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"0",
"1",
"2",
"3",
"4"
]
}
}
]
$
the only "problem" is that unique sorts the elements of the array, so for example contents of "dddd0003" array are not in the same order of your expected result. I don't know if this is could be a problem for you.
if "Arrays" property can also contain "non-array" values, extra care can be taken in order to "filter out" those "non-array" values so that unique doesn't complain.
select(type == "array") can be used: (output omitted):
$ jq '(.[].Arrays[] | select(type == "array")) |= unique' file.json
...
or arrays:
$ jq '(.[].Arrays[] | arrays) |= unique' file.json
...
these last two solutions better reflect your algorithm.
var jsonArr = [
{
"name": "first",
"Arrays": {
"dddd0001": [
"A",
"A",
"B",
"B",
"C",
"C",
"C",
"C",
"D",
"E",
"F"
]
}
},
{
"name": "second",
"Arrays": {
"dddd0002": [
"AA",
"AA",
"BA",
"BB",
"CC",
"CC",
"CC",
"CC",
"DD",
"DD",
"FF"
]
}
},
{
"name": "third",
"Arrays": {
"dddd0003": [
"1",
"1",
"2",
"3",
"3",
"4",
"4",
"4",
"0",
"0",
"0"
]
}
}
]
for(var i=0; i< jsonArr.length; i++)
{
var arrtemp = jsonArr[i][Object.keys(jsonArr[i])[1]];
var arrtmp2 = arrtemp[Object.keys(arrtemp)[0]];
jsonArr[i][Object.keys(jsonArr[i])[1]] =arrtmp2.filter((v, p) => arrtmp2.indexOf(v) == p);
console.log(jsonArr[i])
}

How to filter a collection in Mongodb

Let's say I have three documents in a collection, like so:
[
{"_id": "101", parts: ["a", "b"]},
{"_id": "102", parts: ["a", "c"]},
{"_id": "103", parts: ["a", "z"]},
]
what is the query I have to write so that if I input ["a","b","c"]
(i.e. all items in parts field value in each doc should be present in ["a","b","c"]) will output:
[
{"_id": "101", parts: ["a", "b"]},
{"_id": "102", parts: ["a", "c"]}
]
is this even possible? any idea?
Below solution may not be the best but it works. The idea is finding all documents that has no items in parts outside the input array. It can be done with combination of $not, $elemMatch and $nin:
db.collection.find({
parts: {
$not: {
"$elemMatch": {
$nin: ["a", "b", "c"]
}
}
}
})
Mongo Playground
Thanks to #prasad_. I have tried to come up with a solution which is similar to what I wanted. I have used $setDifference here.
db.collection.aggregate([
{
$project: {
diff: {
$setDifference: [
"$parts",
[
"a",
"b",
"c"
]
]
},
document: "$$ROOT"
}
},
{
$match: {
"diff": {
$eq: []
}
}
},
{
$project: {
"diff": 0
}
},
])
output:
[
{
"_id": "101",
"document": {
"_id": "101",
"parts": [
"a",
"b"
]
}
},
{
"_id": "102",
"document": {
"_id": "102",
"parts": [
"a",
"c"
]
}
}
]
Mongo Playground

How do I do a '$all $in' on mongodb?

I have the following collection
>db.prueba.find({})
{ "_id" : "A", "requi" : null }
{ "_id" : "B", "requi" : null }
{ "_id" : "C", "requi" : [ "A" ] }
{ "_id" : "D", "requi" : [ "A", "B" ] }
{ "_id" : "E", "requi" : [ "C" ] }
{ "_id" : "F", "requi" : [ "B", "D"] }
{ "_id" : "G", "requi" : [ "F" ] }
I need each element of the requi field to be in the following array. in this case, the array has only one element
["A", "D"]
When I use the operator $all $in returns the following
>db.prueba.find({requi:{$elemMatch:{$in:['A','D']}}})
{ "_id" : "C", "requi" : [ "A" ] }
{ "_id" : "D", "requi" : [ "A", "B" ] }
{ "_id" : "F", "requi" : [ "B", "D" ] }
the query must returns only document, because 'B' not exists in the array ["A" , "D"]
{ "_id" : "C", "requi" : [ "A" ] }
please, help me
You can use $setIsSubset to check whether the given array is set of the requi array and then $redact to eliminate the non-matched ones.
db.collection.aggregate([
{ "$match": { "requi": { "$ne": null } } },
{ "$redact": {
"$cond": {
"if": { "$eq": [{ "$setIsSubset": ["$requi", ["A", "D"]] }, true] },
"then": "$$DESCEND",
"else": "$$PRUNE"
}
}}
])

Find documents whose array field contain some subsets in MongoDB

The "users" collection has documents with an array field.
Example documents:
{
"_id" :1001,
"properties" : ["A", "B", "C", "D", "E", "F", "G", "H", "I"]
}
{
"_id" : 1002,
"properties" : ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
}
How can I build a query to get the documents which follow the next condition?
Get only the documents that have the properties:
[ "3" AND ("A" OR "1") AND ("B" OR "2") ]
or in other way:
"3" AND "A" AND "B"
OR
"3" AND "A" AND "2"
OR
"3" AND "1" AND "B"
OR
"3" AND "1" AND "2"
In the previous example, the query has to result only the document:
{
"_id" : 1002,
"properties" : ["1", "2", "3", "4", "5", "6", "7", "8", "9", "10"]
}
The collection has 4 million documents. The document array "properties" field has average length of 15 elements. The query I am looking for must have a good performance in this quite a big collection.
Stephan's answer is ok. Other ways to achieve the result using $in and $all operators:
db.users.find(
{
$and:[
{"properties":"3"},
{"properties" : {$in: ["A", "1"]}},
{"properties" : {$in: ["B", "2"]}}
]
}
);
(translation of your first description of the subset)
And
db.users.find(
{
$or: [
{"properties" : {$all: ["3", "A", "B"]}},
{"properties" : {$all: ["3", "A", "2"]}},
{"properties" : {$all: ["3", "1", "B"]}},
{"properties" : {$all: ["3", "1", "2"]}}
]
}
);
(translation of your second description of the subset)
I'm afraid I can't tell which one will ensure the best performance. I hope that you have and index on properties.
You may try the queries on a smaller collection with explain to see the execution plan
try this:
db.users.find(
{
$or: [
{$and: [{ "properties": "3" }, { "properties": "A" }, { "properties": "B" }]},
{$and: [{ "properties": "3" }, { "properties": "A" }, { "properties": "2" }]},
{$and: [{ "properties": "3" }, { "properties": "1" }, { "properties": "B" }]},
{$and: [{ "properties": "3" }, { "properties": "1" }, { "properties": "2" }]}
]
}
);
or
db.users.find(
{
$and: [
{"properties": "3" },
{$or: [ { "properties": "A" }, { "properties": "1" } ]},
{$or: [ { "properties": "B" }, { "properties": "2" } ]}
]
}
);

Resources