MongoDB - Pipeline $lookup with $group losing fields - database

I only have 2 years exp with SQL databases and 0 with NoSQL database. I am trying to write a pipeline using MongoDB Compass aggregate pipeline tool that performs a lookup, group, sum, and sort. I am using MongoDB compass to try and accomplish this. Also, please share any resources that make learning this easier, I've not had much like finding good and easy-to-understand examples online with using the compass to accomplish these tasks. Thank you.
An example question I am trying to solve is:
What customer placed the highest number of orders?
Example Data is:
Customer Collection:
[
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f53" },
"Id": "1",
"FirstName": "Maria",
"LastName": "Anders",
"City": "Berlin",
"Country": "Germany",
"Phone": "030-0074321"},
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f54" },
"Id": "2",
"FirstName": "Ana",
"LastName": "Trujillo",
"City": "México D.F.",
"Country": "Mexico",
"Phone": "(5) 555-4729" }
]
Order Collection:
[
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b501f" },
"Id": "1",
"OrderDate": "2012-07-04 00:00:00.000",
"OrderNumber": "542378",
"CustomerId": "85",
"TotalAmount": "440.00" },
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b5020" },
"Id": "2",
"OrderDate": "2012-07-05 00:00:00.000",
"OrderNumber": "542379",
"CustomerId": "79",
"TotalAmount": "1863.40" }
]
I have spent all day looking at YouTube videos and MongoDB documentation but I am failing to comprehend a few things. One, at the time I do a $group function I lose all the fields not associated with the group and I would like to keep a few fields. I would like to have it returned the name of the customer with the highest order.
The pipeline I was using that gets me part of the way is the following:
[{
$lookup: {
from: 'Customer',
localField: 'CustomerId',
foreignField: 'Id',
as: 'CustomerInfo'
}}, {
$project: {
CustomerId: 1,
CustomerInfo: 1
}}, {
$group: {
_id: '$CustomerInfo.Id',
CustomerOrderNumber: {
$sum: 1
}
}}, {
$sort: {
CustomerOrderNumber: -1
}}]
Example data this returns in order:
Apologies for the bad formatting, still trying to get the hang of posting questions that are easy to understand and useful.

In $group stage, it only returns documents with _id and CustomerOrderNumber fields, so CustomerInfo field was missing.
$lookup
$project - From 1st stage, CustomerInfo returns as an array, hence getting the first document as a document field instead of an array field.
$group - Group by CustomerId, sum the documents as CustomerOrderNumber, and take the first document as CustomerInfo.
$project - Decorate the output documents.
$setWindowsFields - With $denseRank to rank the document position by CustomerOrderNumber (DESC). If there are documents with same CustomerOrderNumber, the ranking will treat them as same rank/position.
$match - Select documents with denseRankHighestOrder is 1 (highest).
db.Order.aggregate([
{
$lookup: {
from: "Customer",
localField: "CustomerId",
foreignField: "Id",
as: "CustomerInfo"
}
},
{
$project: {
CustomerId: 1,
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$group: {
_id: "$CustomerInfo.Id",
CustomerOrderNumber: {
$sum: 1
},
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$project: {
_id: 0,
CustomerId: "$_id",
CustomerOrderNumber: 1,
CustomerName: {
$concat: [
"$CustomerInfo.FirstName",
" ",
"$CustomerInfo.LastName"
]
}
}
},
{
$setWindowFields: {
sortBy: {
CustomerOrderNumber: -1
},
output: {
denseRankHighestOrder: {
$denseRank: {}
}
}
}
},
{
$match: {
denseRankHighestOrder: 1
}
}
])
Sample Mongo Playground
Note:
$sort stage able to sort the document by CustomerOrderNumber. But if you try to limit the documents such as "SELECT TOP n", the output result may be incorrect when there are multiple documents with the same CustomerOrderNumber/rank.
Example: SELECT TOP 1 Customer who has the highest CustomerOrderNumber but there are 3 customers who have the highest CustomerOrderNumber.

Related

How to access nested array of objects in mongodb aggregation pipeline?

I have a document like this(this is the result after few pipeline stages)
[
{
"_id": ObjectId("5e9d5785e4c8343bb2b455cc"),
"name": "Jenny Adams",
"report": [
{ "category":"Beauty", "status":"submitted", "submitted_on": [{"_id": "xyz", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "abc", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Kitchen", "status":"submitted", "submitted_on": [{"_id": "mnp", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
},
{
"_id": ObjectId("5e9d5785e4c8343bb2b455db"),
"name": "Mathew Smith",
"report": [
{ "category":"Household", "status":"submitted", "submitted_on": [{"_id": "123", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "345", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Garden", "status":"submitted", "submitted_on": [{"_id": "567", "timestamp":"2022-05-08T06:10:06.432+00:00"}] },
{ "category":"BakingNeeds", "status":"submitted", "submitted_on": [{"_id": "891", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
}
]
I have user input for time period -
from - 2021-02-23T06:10:05.832+00:00
to - 2022-02-23T06:10:05.832+00:00
Now I wanted to filter the objects from the report which lie in a certain range of time, I want to only keep the object if the "submitted_on[-1]["timestamp"]" is in range of from and to date timestamp.
I am struggling with accessing the timestamp because of the nesting
I tried this
$project: {
"name": 1,
"report": {
"category": 1,
"status": 1,
"submitted_on": 1,
"timestamp": {
$arrayElemAt: ["$report.cataloger_submitted_on", -1]
}
}
}
But this gets the last object of the report array {"_id": "bcd", "timestamp":"2022-05-08T06:10:06.432+00:00"} for all the items inside the report. How can I do this to select the last timestamp of each obj.
You can replace your phase in the aggregation pipeline with two phases: $unwind and $addFields in order to get what I think you want:
{
$unwind: "$report"
},
{
"$addFields": {
"timestamp": {
$arrayElemAt: [
"$report.submitted_on",
-1
]
}
}
},
The $unwind phase is breaking the external array into documents since you want to perform an action on each one of them. See the playground here with your example. If you plan to continue the aggregation pipeline with more steps, you can probably skip the $addFields phase and include the condition inside your next $match phase.

Get frequency for multiple elements in all documents inside a collection mongodb

So heres my problem.
I am new to mongodb and have a collection which documents are saved like this:
{
"_id": {
"$oid": "60626db173b4ca321c02ee3e"
},
"year": "2021",
"name": "Book 1",
"authors": ["Joe, B", "Jessica, K"],
"createdAt": {
"$date": "2021-03-30T00:15:45.859Z"
}
},
{
"_id": {
"$oid": "60626db173b4ca321c02ee4e"
},
"year": "2021",
"authors": ["Carl, B", "Jessica, K"],
"name": "Book 2"
"createdAt": {
"$date": "2021-03-30T00:15:45.859Z"
}
},
I need to get both the frequency of all authors and the years of the books.
The expected result would be something like this (as long as i can get each element frequency it doesn't really matter how the results are returned):
{
"authors": {
"Joe, B": 1,
"Carl, B": 1,
"Jessica, K": 2
},
"year": {
"2021": 2
}
}
I've seen this thread How to count occurence of each value in array? which does the job in one array but i have no idea if its possible to adapt to get the frequency of multiple elements (year, authors) at the same time or how to do it.
I appreciate any help. Thank you.
Demo - https://mongoplayground.net/p/95JtQEThxvV
$group by year $push authors into the array get $sum count of the year occurrence, $unwind into individuals documents.
$group by authors and get $sum count of the author occurrence
$group by null to combine all documents, use $addToSet to push unique values and convert $arrayToObject to get final output in $project
$first
db.collection.aggregate([
{
$group: {
_id: { year: "$year" },
authors: { $push: "$authors" },
yearCount: { $sum: 1 }
}
},
{ $unwind: "$authors" },
{ $unwind: "$authors"},
{
$group: {
_id: { author: "$authors" },
year: { $first: "$_id.year" },
yearCount: { $first: "$yearCount" },
authors: { $push: "$authors" },
authorCount: { $sum: 1 }
}
},
{
"$group": {
_id: null,
years: {
$addToSet: { k: "$year", v: "$yearCount" }
},
authors: {
$addToSet: { k: "$_id.author", v: "$authorCount" }
}
}
},
{
$project: {
_id: 0,
years: { $arrayToObject: "$years" },
authors: { $arrayToObject: "$authors" }
}
}
])
Demo 2 - For author count grouped by year- https://mongoplayground.net/p/_elnjmknroF

MongoDB - How to get all documents not being referenced by any document in a different collection

We have two collections, Teams and Matches. Every time a Match is reported, a new document is saved in that collection and its added to an array in the Team documents (teams[i].matches).
A now solved bug in our system has caused that the new Matches document were not referenced in their respectives Teams documents.
Is there a query for Mongo DB 3.6.9 that can help us find the Matches not referenced in Teams?
An aggregation pipeline may help you, using $lookup.
$lookup fetches documents from "Teams" that match the pipeline's $match.
let: { match_id: "$_id" } create a variable match_id corresponding to Match's _id.
$match expression keeps only Teams with match_id into Team's matches array.
as: "matches" stores Team that validate previous $match.
Last $match after $lookup step keeps matches array that are empty (Matches with no Teams)
db.Matches.aggregate([
{
$lookup: {
from: "Teams",
let: { match_id: "$_id" },
pipeline: [{
$match: {
$expr: {
$in: [ "$$match_id", "$matches" ]
}
}
}],
as: "matches"
},
},
{
$match: {
$expr: { $eq: [{ $size: "$matches" }, 0] }
}
}
]);
This has been tested with the following collection template and Mongo playground online editor :
db={
"Matches": [
{ "_id": 0 },
{ "_id": 1 },
{ "_id": 2 },
{ "_id": 3 },
{ "_id": 4 },
],
"Teams": [
{
"_id": 0,
matches: [ 0, 3 ],
},
{
"_id": 1,
matches: [],
},
{
"_id": 2,
matches: [ 0 ],
},
{
"_id": 3,
matches: [ 2 ],
}
]
}
The resulting output is :
[
{
"_id": 1,
"matches": []
},
{
"_id": 4,
"matches": []
}
]

How to join and group user list results with values from array in another collection? [duplicate]

This question already has answers here:
How do I perform the SQL Join equivalent in MongoDB?
(19 answers)
$lookup on ObjectId's in an array
(7 answers)
Closed 3 years ago.
I am new to MongoDB and am trying to retrieve a list of users that contains their user roles which are stored in the DB as a separate collection to the users collection.
I have a users collection, sample document below:
/* 1 */
{
"_id": "cf67e695-ea52-47a8-8e42-b95b863a2b69",
"DateCreated": ISODate("2018-11-11T21:41:37.125Z"),
"Email": "user#email.com",
"FirstName": "John",
"LastName": "Does",
"Language": "en",
"TimeZoneId": null,
"Roles": [
"c2ee344f-48b7-4c46-9392-853c6debd631",
"ada94631-af8c-43e9-a031-de62ffae1d20"
],
"Status": 0
}
I also have a Roles collection, sample document below:
{
"_id": "c2ee344f-48b7-4c46-9392-853c6debd631",
"DateCreated": ISODate("2018-11-14T10:58:27.053Z"),
"Name": "Administrator 2",
"Description": " View only but able to manage their own users (credentials). They do have update capabilities"
}
What I want is to retrieve a list that shows the user friendly name of the role and not the _id value of each role that is currently stored against the User Profile. I'm close but not quite there.
Using the following, I am able to get the results of the Roles list but ONLY the role Names. What I want is the full User profiles with the user friendly names attached with each document. Here is my syntax so far:
db.Users.aggregate([
{
$match: {
"Email": /#email.com/
}
},
{
$project: {
_id: 1,
Email: 1,
FirstName: 1,
Roles: 1
}
},
{
$unwind: {
path: "$Roles"
}
},
{
$lookup: {
from: "Roles",
localField: "Roles",
foreignField: "_id",
as: "Roles"
}
},
{
$group: {
_id: "$_id",
Roles: {
$push: "$Roles.Name"
}
}
}
])
For whatever reason I am not getting the Email, FirstName and LastName fields returned in the results. All I get currently is:
{
"_id": "639ba2b6-fc80-44f4-8ac0-0a92d61099c4",
"Roles": [
[
"Administrator 2"
],
[
"Administrators"
]
]
}
I would like to get something like:
{
"_id": "cf67e695-ea52-47a8-8e42-b95b863a2b69",
"DateCreated": ISODate("2018-11-11T21:41:37.125Z"),
"Email": "user#email.com",
"FirstName": "John",
"LastName": "Does",
"Language": "en",
"TimeZoneId": null,
"Roles": [
"Administrators",
"Administrator 2"
],
"Status": 0
}
Any help, much appreciated!
You have to include those fields in your $group pipeline, Please just update your $group with below:
$group: {
_id: "$_id",
FirstName: {$first:"$FirstName"},
Email: {$first:"Email"},
Roles: {
$push: "$Roles.Name"
}
}
you can add more fields you need using $first in $group. I have tried with the given sample document and its working for me.
Add this code
db.users.aggregate([
{
$unwind: "$Roles"
},
{
$lookup:
{
from: "roles",
localField: "_id",
foreignField: "Roles",
as: "Roles"
}
}
]).exec((err, result) => {
console.log(result);
})

Comparing an element's field in array with a field in MongoDB

I have a collection Group like this:
{
"_id" : ObjectId("5822dd5cb6a69ca404e0d93c"),
"name" : "GROUP 1",
"member": [
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93d")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93f")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("573d5ff8d1b7b3b32e165599"),
"name" : "GROUP 2",
"member": [
{
"_id": ObjectId("574802e031e70b503eabe195")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5775f1a74b41037e246a51d1"),
},
{
"_id": ObjectId("574802e031e70b503eabe198")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("576cfa042c0a4054794dd242"),
}
],
curTask: {
"_id": ObjectId("577249a2f9dba0c750ef705b"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("574802e031e70b503eabe194"),
"name" : "GROUP 3",
"member": [
{
"_id": ObjectId("574be0a2bf16234f5a752f83")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("574d397d6e9f07d64d1e4e40")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
And I want to be able to find all group where user with objectId 573ac820eb3ed3ea156905f6 (1st user in group 1) do not do the same task as currentTask. So far I've wrote this query:
db.getCollection('groups').find({"member":{ "$elemMatch": {"user": ObjectId("573ac820eb3ed3ea156905f6")
, "task": { "$ne":"this.curTask._id"}}}})
But this didn't seem to work as it still return the group where user 573ac820eb3ed3ea156905f6 having his task === curTask._id. The first half of elemMatch seem to work fine (only find group with user with objectid 573ac820eb3ed3ea156905f6 in member, the query only return group 1 and 2 since group 3 don't have that user.) but I cant seem to make mongodb compare a field in the object of the array with another field of the document. Anyone have any idea how do I make this comparison?
There are two solutions to the problem -
First - Using $where. By using $where you can use Javascript code inside mongodb queries. Makes the code flexible, but the shortcoming is that it runs slow since Javascript code has to run rather than more optimized mongoDB C++ code.
db.getCollection('groups').find({
$where: function () {
var flag = 0;
for(var i=0; i<obj.member.length;i++) {
if(obj.member[i].user.str == ObjectId("573ac820eb3ed3ea156905f6").str && obj.member[i].task.str != obj.curTask._id.str ){flag = 1; break;}
}
return flag;
}
})
Second - Using an aggregation pipeline. Here I am unwinding the array, doing matches as described, and finally recreating the array as it was needed. If the not matching elements in the member array are not needed, one can omit the last grouping part.
[
{$match: {'member.user': ObjectId("573ac820eb3ed3ea156905f6")}},
{$unwind: '$member'},
{$project: {
name: 1,
member: 1,
curTask: 1,
ne: {$and: [{$ne: ['$member.task', '$curTask._id']}, {$eq: ['$member.user', ObjectId("573ac820eb3ed3ea156905f6")]}]}
}},
{$group: {
_id: '$_id',
member: {$push: '$member'},
curTask: {$first: '$curTask'},
name: {$first: '$name'},
check: {$sum: {$cond: ['$ne', 1, 0]}}
}},
{$match: {check: {$gt: 0}}}
]

Resources