How to access nested array of objects in mongodb aggregation pipeline? - arrays

I have a document like this(this is the result after few pipeline stages)
[
{
"_id": ObjectId("5e9d5785e4c8343bb2b455cc"),
"name": "Jenny Adams",
"report": [
{ "category":"Beauty", "status":"submitted", "submitted_on": [{"_id": "xyz", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "abc", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Kitchen", "status":"submitted", "submitted_on": [{"_id": "mnp", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
},
{
"_id": ObjectId("5e9d5785e4c8343bb2b455db"),
"name": "Mathew Smith",
"report": [
{ "category":"Household", "status":"submitted", "submitted_on": [{"_id": "123", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "345", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Garden", "status":"submitted", "submitted_on": [{"_id": "567", "timestamp":"2022-05-08T06:10:06.432+00:00"}] },
{ "category":"BakingNeeds", "status":"submitted", "submitted_on": [{"_id": "891", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
}
]
I have user input for time period -
from - 2021-02-23T06:10:05.832+00:00
to - 2022-02-23T06:10:05.832+00:00
Now I wanted to filter the objects from the report which lie in a certain range of time, I want to only keep the object if the "submitted_on[-1]["timestamp"]" is in range of from and to date timestamp.
I am struggling with accessing the timestamp because of the nesting
I tried this
$project: {
"name": 1,
"report": {
"category": 1,
"status": 1,
"submitted_on": 1,
"timestamp": {
$arrayElemAt: ["$report.cataloger_submitted_on", -1]
}
}
}
But this gets the last object of the report array {"_id": "bcd", "timestamp":"2022-05-08T06:10:06.432+00:00"} for all the items inside the report. How can I do this to select the last timestamp of each obj.

You can replace your phase in the aggregation pipeline with two phases: $unwind and $addFields in order to get what I think you want:
{
$unwind: "$report"
},
{
"$addFields": {
"timestamp": {
$arrayElemAt: [
"$report.submitted_on",
-1
]
}
}
},
The $unwind phase is breaking the external array into documents since you want to perform an action on each one of them. See the playground here with your example. If you plan to continue the aggregation pipeline with more steps, you can probably skip the $addFields phase and include the condition inside your next $match phase.

Related

Sorting an array of objects in MongoDB collection

I have a collection in a MongoDB with a document like below:
{
"_id": "63269e0f85bfd011e989d0f7",
"name": "Aravind Krishna",
"mobile": 7309454620,
"email": "akaravindk59#gmail.com",
"password": "$2b$12$0dOE/0wj6uX604h3DZpGxuO/L.fZg7KCm7mOGsNMkarSaeG2C/Wvq",
"orders": [
{
"paymentIntentId": "pi_3LjFDtSHVloG65Ul0exLkzsO",
"cart": [array],
"amount": 3007,
"created": 1663475717,
"_id": "6326a01344f26617fc1a65d6"
},
{
"paymentIntentId": "pi_3LjFFUSHVloG65Ul1FQHlZ9H",
"cart": [array],
"amount": 389,
"created": 1663475816,
"_id": "6326a07744f26617fc1a65d8"
}
],
"__v": 0
}
I wanted to get only the orders array sorted by the created property in both ascending and descending manner. As we can see here that orders field inside this document is an array of objects. Please give a solution if you have one. I tried the sortArray method but it is giving an error. Please help.
For some assistance the output should look something like this:
{
"_id": "63269e0f85bfd011e989d0f7",
"orders": [
{
"paymentIntentId": "pi_3LjFDtSHVloG65Ul0exLkzsO",
"cart": [array],
"amount": 3007,
"created": 1663475717,
"_id": "6326a01344f26617fc1a65d6"
},
{
"paymentIntentId": "pi_3LjFFUSHVloG65Ul1FQHlZ9H",
"cart": [array],
"amount": 389,
"created": 1663475816,
"_id": "6326a07744f26617fc1a65d8"
}
]
}
See, I got only the orders field but I want it sorted by the created property value both ascending and descending.
As you mentioned the best way to achieve this is to use $sortArray, I'm assuming the error you're getting is a version mismatch as this operator was only recently added at version 5.2.
The other way to achieve the same result as not as pleasant, you need to $unwind the array, $sort the results and then $group to restore structure, then it's easy to add the "other" order by using $reverseArray although I recommend instead of duplicating your data you just handle the "reverse" requirement in code, overall the pipeline will look like so:
db.collection.aggregate([
{
$unwind: "$orders"
},
{
$sort: {
"orders.created": 1
}
},
{
$group: {
_id: "$_id",
asc_orders: {
$push: "$orders"
}
}
},
{
$addFields: {
desc_orders: {
"$reverseArray": "$asc_orders"
}
}
}
])
Mongo Playground

MongoDB - Pipeline $lookup with $group losing fields

I only have 2 years exp with SQL databases and 0 with NoSQL database. I am trying to write a pipeline using MongoDB Compass aggregate pipeline tool that performs a lookup, group, sum, and sort. I am using MongoDB compass to try and accomplish this. Also, please share any resources that make learning this easier, I've not had much like finding good and easy-to-understand examples online with using the compass to accomplish these tasks. Thank you.
An example question I am trying to solve is:
What customer placed the highest number of orders?
Example Data is:
Customer Collection:
[
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f53" },
"Id": "1",
"FirstName": "Maria",
"LastName": "Anders",
"City": "Berlin",
"Country": "Germany",
"Phone": "030-0074321"},
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f54" },
"Id": "2",
"FirstName": "Ana",
"LastName": "Trujillo",
"City": "México D.F.",
"Country": "Mexico",
"Phone": "(5) 555-4729" }
]
Order Collection:
[
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b501f" },
"Id": "1",
"OrderDate": "2012-07-04 00:00:00.000",
"OrderNumber": "542378",
"CustomerId": "85",
"TotalAmount": "440.00" },
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b5020" },
"Id": "2",
"OrderDate": "2012-07-05 00:00:00.000",
"OrderNumber": "542379",
"CustomerId": "79",
"TotalAmount": "1863.40" }
]
I have spent all day looking at YouTube videos and MongoDB documentation but I am failing to comprehend a few things. One, at the time I do a $group function I lose all the fields not associated with the group and I would like to keep a few fields. I would like to have it returned the name of the customer with the highest order.
The pipeline I was using that gets me part of the way is the following:
[{
$lookup: {
from: 'Customer',
localField: 'CustomerId',
foreignField: 'Id',
as: 'CustomerInfo'
}}, {
$project: {
CustomerId: 1,
CustomerInfo: 1
}}, {
$group: {
_id: '$CustomerInfo.Id',
CustomerOrderNumber: {
$sum: 1
}
}}, {
$sort: {
CustomerOrderNumber: -1
}}]
Example data this returns in order:
Apologies for the bad formatting, still trying to get the hang of posting questions that are easy to understand and useful.
In $group stage, it only returns documents with _id and CustomerOrderNumber fields, so CustomerInfo field was missing.
$lookup
$project - From 1st stage, CustomerInfo returns as an array, hence getting the first document as a document field instead of an array field.
$group - Group by CustomerId, sum the documents as CustomerOrderNumber, and take the first document as CustomerInfo.
$project - Decorate the output documents.
$setWindowsFields - With $denseRank to rank the document position by CustomerOrderNumber (DESC). If there are documents with same CustomerOrderNumber, the ranking will treat them as same rank/position.
$match - Select documents with denseRankHighestOrder is 1 (highest).
db.Order.aggregate([
{
$lookup: {
from: "Customer",
localField: "CustomerId",
foreignField: "Id",
as: "CustomerInfo"
}
},
{
$project: {
CustomerId: 1,
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$group: {
_id: "$CustomerInfo.Id",
CustomerOrderNumber: {
$sum: 1
},
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$project: {
_id: 0,
CustomerId: "$_id",
CustomerOrderNumber: 1,
CustomerName: {
$concat: [
"$CustomerInfo.FirstName",
" ",
"$CustomerInfo.LastName"
]
}
}
},
{
$setWindowFields: {
sortBy: {
CustomerOrderNumber: -1
},
output: {
denseRankHighestOrder: {
$denseRank: {}
}
}
}
},
{
$match: {
denseRankHighestOrder: 1
}
}
])
Sample Mongo Playground
Note:
$sort stage able to sort the document by CustomerOrderNumber. But if you try to limit the documents such as "SELECT TOP n", the output result may be incorrect when there are multiple documents with the same CustomerOrderNumber/rank.
Example: SELECT TOP 1 Customer who has the highest CustomerOrderNumber but there are 3 customers who have the highest CustomerOrderNumber.

Multikey partial index not used with elemMatch

Consider the following document format which has an array field tasks holding embedded documents
{
"foo": "bar",
"tasks": [
{
"status": "sleep",
"id": "1"
},
{
"status": "active",
"id": "2"
}
]
}
There exists a partial index on key tasks.id
{
"v": 2,
"unique": true,
"key": {
"tasks.id": 1
},
"name": "tasks.id_1",
"partialFilterExpression": {
"tasks.id": {
"$exists": true
}
},
"ns": "zardb.quxcollection"
}
The following $elemMatch query with multiple conditions on the same array element
db.quxcollection.find(
{
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
does not seem to use the index
"winningPlan": {
"stage": "COLLSCAN",
"filter": {
"tasks": {
"$elemMatch": {
"$and": [{
"id": {
"$eq": "1"
}
},
{
"status": {
"$not": {
"$eq": "active"
}
}
}
]
}
}
},
"direction": "forward"
}
How can I make the above query use the index? The index does seem to be used via dot notation
db.quxcollection.find({"tasks.id": "1"})
however I need the same array element to match multiple conditions which includes the status field, and the following does not seem to be equivalent to the above $elemMatch based query
db.quxcollection.find({
"tasks.id": "1",
"tasks.status": { "$nin": ["active"] }
})
The way the partial indexes work is it uses the path as a key. With $elemMatch you don't have the path explicitly in the query. If you check it with .explain("allPlansExecution") it is not even considered by the query planner.
To benefit from the index you can specify the path in the query:
db.quxcollection.find(
{
"tasks.id": "1",
"tasks": {
"$elemMatch": {
"id": {
"$eq": "1"
},
"status": {
"$nin": ["active"]
}
}
}
}).explain()
It duplicates part of the elemMatch condition, so the index will be used to get all documents containing tasks of specific id, then it will filter out documents with "active" tasks at fetch stage. I must admit the query doesn't look nice, so may be add some comments to the code with explanations.

MongoDB - How to get all documents not being referenced by any document in a different collection

We have two collections, Teams and Matches. Every time a Match is reported, a new document is saved in that collection and its added to an array in the Team documents (teams[i].matches).
A now solved bug in our system has caused that the new Matches document were not referenced in their respectives Teams documents.
Is there a query for Mongo DB 3.6.9 that can help us find the Matches not referenced in Teams?
An aggregation pipeline may help you, using $lookup.
$lookup fetches documents from "Teams" that match the pipeline's $match.
let: { match_id: "$_id" } create a variable match_id corresponding to Match's _id.
$match expression keeps only Teams with match_id into Team's matches array.
as: "matches" stores Team that validate previous $match.
Last $match after $lookup step keeps matches array that are empty (Matches with no Teams)
db.Matches.aggregate([
{
$lookup: {
from: "Teams",
let: { match_id: "$_id" },
pipeline: [{
$match: {
$expr: {
$in: [ "$$match_id", "$matches" ]
}
}
}],
as: "matches"
},
},
{
$match: {
$expr: { $eq: [{ $size: "$matches" }, 0] }
}
}
]);
This has been tested with the following collection template and Mongo playground online editor :
db={
"Matches": [
{ "_id": 0 },
{ "_id": 1 },
{ "_id": 2 },
{ "_id": 3 },
{ "_id": 4 },
],
"Teams": [
{
"_id": 0,
matches: [ 0, 3 ],
},
{
"_id": 1,
matches: [],
},
{
"_id": 2,
matches: [ 0 ],
},
{
"_id": 3,
matches: [ 2 ],
}
]
}
The resulting output is :
[
{
"_id": 1,
"matches": []
},
{
"_id": 4,
"matches": []
}
]

Comparing an element's field in array with a field in MongoDB

I have a collection Group like this:
{
"_id" : ObjectId("5822dd5cb6a69ca404e0d93c"),
"name" : "GROUP 1",
"member": [
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93d")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93f")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("573d5ff8d1b7b3b32e165599"),
"name" : "GROUP 2",
"member": [
{
"_id": ObjectId("574802e031e70b503eabe195")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5775f1a74b41037e246a51d1"),
},
{
"_id": ObjectId("574802e031e70b503eabe198")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("576cfa042c0a4054794dd242"),
}
],
curTask: {
"_id": ObjectId("577249a2f9dba0c750ef705b"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("574802e031e70b503eabe194"),
"name" : "GROUP 3",
"member": [
{
"_id": ObjectId("574be0a2bf16234f5a752f83")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("574d397d6e9f07d64d1e4e40")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
And I want to be able to find all group where user with objectId 573ac820eb3ed3ea156905f6 (1st user in group 1) do not do the same task as currentTask. So far I've wrote this query:
db.getCollection('groups').find({"member":{ "$elemMatch": {"user": ObjectId("573ac820eb3ed3ea156905f6")
, "task": { "$ne":"this.curTask._id"}}}})
But this didn't seem to work as it still return the group where user 573ac820eb3ed3ea156905f6 having his task === curTask._id. The first half of elemMatch seem to work fine (only find group with user with objectid 573ac820eb3ed3ea156905f6 in member, the query only return group 1 and 2 since group 3 don't have that user.) but I cant seem to make mongodb compare a field in the object of the array with another field of the document. Anyone have any idea how do I make this comparison?
There are two solutions to the problem -
First - Using $where. By using $where you can use Javascript code inside mongodb queries. Makes the code flexible, but the shortcoming is that it runs slow since Javascript code has to run rather than more optimized mongoDB C++ code.
db.getCollection('groups').find({
$where: function () {
var flag = 0;
for(var i=0; i<obj.member.length;i++) {
if(obj.member[i].user.str == ObjectId("573ac820eb3ed3ea156905f6").str && obj.member[i].task.str != obj.curTask._id.str ){flag = 1; break;}
}
return flag;
}
})
Second - Using an aggregation pipeline. Here I am unwinding the array, doing matches as described, and finally recreating the array as it was needed. If the not matching elements in the member array are not needed, one can omit the last grouping part.
[
{$match: {'member.user': ObjectId("573ac820eb3ed3ea156905f6")}},
{$unwind: '$member'},
{$project: {
name: 1,
member: 1,
curTask: 1,
ne: {$and: [{$ne: ['$member.task', '$curTask._id']}, {$eq: ['$member.user', ObjectId("573ac820eb3ed3ea156905f6")]}]}
}},
{$group: {
_id: '$_id',
member: {$push: '$member'},
curTask: {$first: '$curTask'},
name: {$first: '$name'},
check: {$sum: {$cond: ['$ne', 1, 0]}}
}},
{$match: {check: {$gt: 0}}}
]

Resources