MongoDB collection data with multiple arrays:
{
"_id": ObjectId("61aa6bf1742b00f59b894eb7"),
"first": ["abc", "def", "ghi"],
"last": ["rst", "uvw", "xyz"],
"numb": ["12", "34", "56"]
}
Expected output where the data in the arrays should be in this format:
{
"first": "abc",
"last": "rst",
"numb": "12"
},
{
"first": "def",
"last": "uvw",
"numb": "34"
},
{
"first": "ghi",
"last": "xyz",
"numb": "56"
}
You can make use of $zip to "transpose" multiple arrays (as many as you'd like actually):
// {
// first: ["abc", "def", "ghi"],
// last: ["rst", "uvw", "xyz"],
// numb: ["12", "34", "56"]
// }
db.collection.aggregate([
{ $project: { x: { $zip: { inputs: ["$first", "$last", "$numb"] } } } },
// { x: [["abc", "rst", "12"], ["def", "uvw", "34"], ["ghi", "xyz", "56" ]] }
{ $unwind: "$x" },
// { x: [ "abc", "rst", "12" ] }
// { x: [ "def", "uvw", "34" ] }
// { x: [ "ghi", "xyz", "56" ] }
{ $replaceWith: {
$arrayToObject: { $zip: { inputs: [["first", "last", "numb"], "$x"] } }
}}
])
// { first: "abc", last: "rst", numb: "12" }
// { first: "def", last: "uvw", numb: "34" }
// { first: "ghi", last: "xyz", numb: "56" }
This:
zips the 3 arrays such that elements at the same index will get grouped into the same sub-array.
$unwinds (explodes/flattens) those sub-arrays.
transforms the resulting arrays into objects to fit your expected output format:
by $zipping (again!) the keys we want to associate with the array's values (the keys: ["first", "last", "numb"] and the values: "$x")
and $replaceWith the current document with the result of the $zip.
Note that prior to Mongo 4.2, you can use $replaceRoot instead of $replaceWith.
Query
map on indexes to combine the same index members to 1 document
keeps the _id also to know from which document those came from
and the index to sort after
for each index take the element from each array
unwind
sort by _id and index to get the results sorted like it was in the arrays
*indexes are computed using the biggest array, to be safe, in case you already know that all are the same size, you can replace the :
{"$max": [{"$size": "$first"}, {"$size": "$last"}, {"$size": "$numb"}]}
with the size of any array for example(we need the biggest to work):
{"$size": "$first"}
Test code here
aggregate(
[{"$project":
{"data":
{"$map":
{"input":
{"$range":
[0,
{"$max":
[{"$size": "$first"}, {"$size": "$last"}, {"$size": "$numb"}]}]},
"in":
{"_id": "$_id",
"index": "$$this",
"first": {"$arrayElemAt": ["$first", "$$this"]},
"last": {"$arrayElemAt": ["$last", "$$this"]},
"numb": {"$arrayElemAt": ["$numb", "$$this"]}}}}}},
{"$unwind": {"path": "$data"}},
{"$replaceRoot": {"newRoot": "$data"}},
{"$sort": {"_id": 1, "index": 1}},
{"$unset": ["index"]}])
Related
I have a document like this(this is the result after few pipeline stages)
[
{
"_id": ObjectId("5e9d5785e4c8343bb2b455cc"),
"name": "Jenny Adams",
"report": [
{ "category":"Beauty", "status":"submitted", "submitted_on": [{"_id": "xyz", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "abc", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Kitchen", "status":"submitted", "submitted_on": [{"_id": "mnp", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
},
{
"_id": ObjectId("5e9d5785e4c8343bb2b455db"),
"name": "Mathew Smith",
"report": [
{ "category":"Household", "status":"submitted", "submitted_on": [{"_id": "123", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "345", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Garden", "status":"submitted", "submitted_on": [{"_id": "567", "timestamp":"2022-05-08T06:10:06.432+00:00"}] },
{ "category":"BakingNeeds", "status":"submitted", "submitted_on": [{"_id": "891", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
}
]
I have user input for time period -
from - 2021-02-23T06:10:05.832+00:00
to - 2022-02-23T06:10:05.832+00:00
Now I wanted to filter the objects from the report which lie in a certain range of time, I want to only keep the object if the "submitted_on[-1]["timestamp"]" is in range of from and to date timestamp.
I am struggling with accessing the timestamp because of the nesting
I tried this
$project: {
"name": 1,
"report": {
"category": 1,
"status": 1,
"submitted_on": 1,
"timestamp": {
$arrayElemAt: ["$report.cataloger_submitted_on", -1]
}
}
}
But this gets the last object of the report array {"_id": "bcd", "timestamp":"2022-05-08T06:10:06.432+00:00"} for all the items inside the report. How can I do this to select the last timestamp of each obj.
You can replace your phase in the aggregation pipeline with two phases: $unwind and $addFields in order to get what I think you want:
{
$unwind: "$report"
},
{
"$addFields": {
"timestamp": {
$arrayElemAt: [
"$report.submitted_on",
-1
]
}
}
},
The $unwind phase is breaking the external array into documents since you want to perform an action on each one of them. See the playground here with your example. If you plan to continue the aggregation pipeline with more steps, you can probably skip the $addFields phase and include the condition inside your next $match phase.
Consider this mongo collection:
[{
"_id": {
"s": "HU",
"k": 1
},
"boxed": {
"values": [{
"s": "NL",
"k": 2
},
{
"s": "BR",
"k": 3
},
{
"s": "NL",
"k": 2
}
]
}
},
{
"_id": {
"s": "FR",
"k": 2
},
"boxed": {
"values": [{
"s": "SE",
"k": 99
}]
}
},
{
"_id": {
"s": "UA",
"k": 14
},
"boxed": {}
}
]
I'm basically trying to find the records that have duplicated boxed.values. One such example would be the first one, where NL*2 is repeated twice.
My first idea was to project the original size of the values array, use $map to turn that array of objects into an array of strings (such as $map: { input: "$boxed.values", in: { $concat: ["$$this.s", "*", "$$this.k"] } }) and then remove the duplicates out of the array of strings, so i can compare the original size with the dupe-removed one. If the size would differ, then it would mean that record has duplicates.
However, it seems that there's no way in mongo (or at least I have not found one) to easily remove duplicated values out of an array of strings.
Any ideas?
You can do something like this,
[
{
$unwind: "$boxed.values"
},
{
$group: {
_id: "$_id",
"values": {
$addToSet: "$boxed.values"
}
}
},
{
$addFields: {
"boxed.values": "$values"
}
}
]
Working Mongo playground. $addToSet is an array operation which removes duplicates
How to avoid empty array while filtering results while querying a collection in MongoDb
[
{
"_id": ObjectId("5d429786bd7b5f4ae4a64790"),
"extensions": {
"outcome": "success",
"docType": "ABC",
"Roll No": "1"
},
"data": [
{
"Page1": [
{
"heading": "LIST",
"content": [
{
"text": "<b>12345</b>"
},
],
}
],
"highlights": [
{
"name": "ABCD",
"text": "EFGH",
}
],
"marks": [
{
"revision": "revision 1",
"Score": [
{
"maths": "100",
"science": "40",
"history": "90"
},
{
"lab1": "25",
"lab2": "25"
}
],
"Result": "Pass"
},
{
"revision": "revision 1",
"Score": [
{
"maths": "100",
"science": "40"
},
{
"lab1": "25",
"lab2": "25"
}
],
"Result": "Pass"
}
]
}
]
}
]
I am looking for results that has only "history" marks in the score array.
I tried the following query (in mongo 3.6.10) but it returns empty score array as well the array that has history as well
db.getCollection('student_scores').find({
"data.marks.score.history": {
$not: {
$type: 10
},
$exists: true
}
},
{
"extensions.rollNo": 1,
"data.marks.score.history": 1
})
Desired output is
{
"extensions": {
"rollNo": "1"
},
"data": [
{
"marks": [
{
"Score": [
{
"history": "90"
}
]
}
]
}
]
}
I used something like the following;
db.getCollection('student_scores').aggregate([
{
$unwind: "$data"
},
{
$unwind: "$data.marks"
},
{
$unwind: "$data.marks.Score"
},
{
$match: {
"data.marks.Score.history": {
$exists: true,
$not: {
$type: 10
}
}
}
},
{
$project: {
"extensions.Roll No": 1,
"data.marks.Score.history": 1
}
},
{
$group: {
_id: "$extensions.Roll No",
history_grades: {
$push: "$data.marks.Score.history"
}
}
}
])
where I got the following result with your input (I think more readable than your expected output);
[
{
"_id": "1",
"history_grades": [
"90"
]
}
]
where _id represents "extensions.Roll No" value for any given data set.
What do you think?
check with a bigger input on mongoplayground
OK, so I still think the data design here with the Score array is a little off but here is solution that will ensure that a Score array contains only 1 entry and that entry is for a key of history. We use dotpath array diving as a trick to get to the value of history.
c = db.foo.aggregate([
{$unwind: "$data"}
,{$unwind: "$data.marks"}
,{$project: {
result: {$cond: [
{$and: [ // if
{$eq: [1, {$size: "$data.marks.Score"}]}, // Only 1 item...
// A little trick! $data.marks.Score.history will resolve to an *array*
// of the values associated with each object in $data.marks.Score (the parent
// array) having a key of history. BUT: As it resolves, if there is no
// field for that key, nothing is added to resolution vector -- not even a null.
// This means the resolved array could
// be **shorter** than the input. FOr example:
// > db.foo.insert({"x":[ {b:2}, {a:3,b:4}, {b:7}, {a:99} ]});
// WriteResult({ "nInserted" : 1 })
// > db.foo.aggregate([ {$project: {z: "$x.b", n: {$size: "$x.b"}} } ]);
// { "z" : [ 2, 4, 7 ], "n" : 3 }
// > db.foo.aggregate([ {$project: {z: "$x.a", n: {$size: "$x.a"}} } ]);
// { "z" : [ 3, 99 ], "n" : 2 }
//
// You must be careful about this.
// But we also know this resolved vector is of size 1 (see above) so we can go ahead and grab
// the 0th item and that becomes our output.
// Note that if we did not have the requirement of ONLY history, then we would not
// need the fancy $cond thing.
{$arrayElemAt: ["$data.marks.Score.history",0]}
]},
{$arrayElemAt: ["$data.marks.Score.history",0]}, // then (use value of history)
null ] } // else set null
,extensions: "$extensions" // just carry over extensions
}}
,{$match: {"result": {$ne: null} }} // only take good ones.
This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 4 years ago.
I have mongo document like this
{
"_id": "5b14679e592baa493e0bc208",
"productCode": "ABC",
"corridors": [
{
"countryNameEn": "Sweden",
"countryNameFr": "Suède",
"countryCode": "SE",
"currencyNameEn": "Swedish Krona",
"currencyNameFr": "Couronne suédoise",
"currencyCode": "SEK",
"corridorLimit": "abc"
},
{
"countryNameEn": "USA",
"countryNameFr": "Suède",
"countryCode": "US",
"currencyNameEn": "USA",
"currencyNameFr": "Couronne suédoise",
"currencyCode": "USD",
"corridorLimit": "abc"
}
]
},
{
"_id": "5b14679e592baa493e0bc208",
"productCode": "XYZ",
"corridors": [
{
"countryNameEn": "Sweden",
"countryNameFr": "Suède",
"countryCode": "SE",
"currencyNameEn": "Swedish Krona",
"currencyNameFr": "Couronne suédoise",
"currencyCode": "SEK",
"corridorLimit": "abc"
},
{
"countryNameEn": "USA",
"countryNameFr": "Suède",
"countryCode": "US",
"currencyNameEn": "USA",
"currencyNameFr": "Couronne suédoise",
"currencyCode": "USD",
"corridorLimit": "abc"
}
]
}
I want to find document whose productCode is ABC and currencyCode is USD in corridors array. How can I return only the matching object from corridors array like only only object that has USD currencyCode should come in result and not all array.
I tried running this query { productCode: 'ABC', corridors: { $elemMatch: { currencyCode: "USD"}}}. But it gives me the whole array and not the only matched element.
I don't think you can use $elemMatch in $project. Check this
Try the following query:
db.collection.aggregate([
{
$match : {"productCode" : "ABC"}
},
{
$unwind : "$corridors"
},
{
$match : { "corridors.currencyCode" : "USD"}
},
{
$group : {
_id : "$productCode",
corridors : {$addToSet : "$corridors"}
}
}]);
Outputs:
{
"_id" : "ABC",
"corridors" : [
{
"countryNameEn" : "USA",
"countryNameFr" : "Suède",
"countryCode" : "US",
"currencyNameEn" : "USA",
"currencyNameFr" : "Couronne suédoise",
"currencyCode" : "USD",
"corridorLimit" : "abc"
}
]
}
In the result you'll have _id instead of productCode. If you still want productCode, you can just include $project in the end.
Hope this helps!
here is my data structure
I want to get a result where I can get the document where there is a row in moderators but not in members
{
"_id" : "10",
"members" : [
"10",
"20",
"30"
],
"moderators" : [
"50",
"60",
"70"
]
}
You can use $setDifference to perform the relative complement to get rows in the moderator array which are not in members array followed by $match to get all the entries where there foundInModerator is populated.
db.collection.aggregate(
[
{ $project: { members: 1, moderators: 1, foundInModerator: { $setDifference: [ "$moderators", "$members" ] }, _id: 0 } },
{ $match:{foundInModerator:{$ne:[] } } }
]
)
For returning the result where the value is in "one array" but "not in the other" simply use the $ne operation:
db.collection.find({ "moderators": "50", "members": { "$ne": "50" } })
So the match condition only returns positive where "50" is present in the "moderators" array but not in the "members" array.