Comparing an element's field in array with a field in MongoDB - arrays

I have a collection Group like this:
{
"_id" : ObjectId("5822dd5cb6a69ca404e0d93c"),
"name" : "GROUP 1",
"member": [
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93d")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93f")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("573d5ff8d1b7b3b32e165599"),
"name" : "GROUP 2",
"member": [
{
"_id": ObjectId("574802e031e70b503eabe195")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5775f1a74b41037e246a51d1"),
},
{
"_id": ObjectId("574802e031e70b503eabe198")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("576cfa042c0a4054794dd242"),
}
],
curTask: {
"_id": ObjectId("577249a2f9dba0c750ef705b"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("574802e031e70b503eabe194"),
"name" : "GROUP 3",
"member": [
{
"_id": ObjectId("574be0a2bf16234f5a752f83")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("574d397d6e9f07d64d1e4e40")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
And I want to be able to find all group where user with objectId 573ac820eb3ed3ea156905f6 (1st user in group 1) do not do the same task as currentTask. So far I've wrote this query:
db.getCollection('groups').find({"member":{ "$elemMatch": {"user": ObjectId("573ac820eb3ed3ea156905f6")
, "task": { "$ne":"this.curTask._id"}}}})
But this didn't seem to work as it still return the group where user 573ac820eb3ed3ea156905f6 having his task === curTask._id. The first half of elemMatch seem to work fine (only find group with user with objectid 573ac820eb3ed3ea156905f6 in member, the query only return group 1 and 2 since group 3 don't have that user.) but I cant seem to make mongodb compare a field in the object of the array with another field of the document. Anyone have any idea how do I make this comparison?

There are two solutions to the problem -
First - Using $where. By using $where you can use Javascript code inside mongodb queries. Makes the code flexible, but the shortcoming is that it runs slow since Javascript code has to run rather than more optimized mongoDB C++ code.
db.getCollection('groups').find({
$where: function () {
var flag = 0;
for(var i=0; i<obj.member.length;i++) {
if(obj.member[i].user.str == ObjectId("573ac820eb3ed3ea156905f6").str && obj.member[i].task.str != obj.curTask._id.str ){flag = 1; break;}
}
return flag;
}
})
Second - Using an aggregation pipeline. Here I am unwinding the array, doing matches as described, and finally recreating the array as it was needed. If the not matching elements in the member array are not needed, one can omit the last grouping part.
[
{$match: {'member.user': ObjectId("573ac820eb3ed3ea156905f6")}},
{$unwind: '$member'},
{$project: {
name: 1,
member: 1,
curTask: 1,
ne: {$and: [{$ne: ['$member.task', '$curTask._id']}, {$eq: ['$member.user', ObjectId("573ac820eb3ed3ea156905f6")]}]}
}},
{$group: {
_id: '$_id',
member: {$push: '$member'},
curTask: {$first: '$curTask'},
name: {$first: '$name'},
check: {$sum: {$cond: ['$ne', 1, 0]}}
}},
{$match: {check: {$gt: 0}}}
]

Related

Sorting an array of objects in MongoDB collection

I have a collection in a MongoDB with a document like below:
{
"_id": "63269e0f85bfd011e989d0f7",
"name": "Aravind Krishna",
"mobile": 7309454620,
"email": "akaravindk59#gmail.com",
"password": "$2b$12$0dOE/0wj6uX604h3DZpGxuO/L.fZg7KCm7mOGsNMkarSaeG2C/Wvq",
"orders": [
{
"paymentIntentId": "pi_3LjFDtSHVloG65Ul0exLkzsO",
"cart": [array],
"amount": 3007,
"created": 1663475717,
"_id": "6326a01344f26617fc1a65d6"
},
{
"paymentIntentId": "pi_3LjFFUSHVloG65Ul1FQHlZ9H",
"cart": [array],
"amount": 389,
"created": 1663475816,
"_id": "6326a07744f26617fc1a65d8"
}
],
"__v": 0
}
I wanted to get only the orders array sorted by the created property in both ascending and descending manner. As we can see here that orders field inside this document is an array of objects. Please give a solution if you have one. I tried the sortArray method but it is giving an error. Please help.
For some assistance the output should look something like this:
{
"_id": "63269e0f85bfd011e989d0f7",
"orders": [
{
"paymentIntentId": "pi_3LjFDtSHVloG65Ul0exLkzsO",
"cart": [array],
"amount": 3007,
"created": 1663475717,
"_id": "6326a01344f26617fc1a65d6"
},
{
"paymentIntentId": "pi_3LjFFUSHVloG65Ul1FQHlZ9H",
"cart": [array],
"amount": 389,
"created": 1663475816,
"_id": "6326a07744f26617fc1a65d8"
}
]
}
See, I got only the orders field but I want it sorted by the created property value both ascending and descending.
As you mentioned the best way to achieve this is to use $sortArray, I'm assuming the error you're getting is a version mismatch as this operator was only recently added at version 5.2.
The other way to achieve the same result as not as pleasant, you need to $unwind the array, $sort the results and then $group to restore structure, then it's easy to add the "other" order by using $reverseArray although I recommend instead of duplicating your data you just handle the "reverse" requirement in code, overall the pipeline will look like so:
db.collection.aggregate([
{
$unwind: "$orders"
},
{
$sort: {
"orders.created": 1
}
},
{
$group: {
_id: "$_id",
asc_orders: {
$push: "$orders"
}
}
},
{
$addFields: {
desc_orders: {
"$reverseArray": "$asc_orders"
}
}
}
])
Mongo Playground

MongoDB - Pipeline $lookup with $group losing fields

I only have 2 years exp with SQL databases and 0 with NoSQL database. I am trying to write a pipeline using MongoDB Compass aggregate pipeline tool that performs a lookup, group, sum, and sort. I am using MongoDB compass to try and accomplish this. Also, please share any resources that make learning this easier, I've not had much like finding good and easy-to-understand examples online with using the compass to accomplish these tasks. Thank you.
An example question I am trying to solve is:
What customer placed the highest number of orders?
Example Data is:
Customer Collection:
[
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f53" },
"Id": "1",
"FirstName": "Maria",
"LastName": "Anders",
"City": "Berlin",
"Country": "Germany",
"Phone": "030-0074321"},
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f54" },
"Id": "2",
"FirstName": "Ana",
"LastName": "Trujillo",
"City": "México D.F.",
"Country": "Mexico",
"Phone": "(5) 555-4729" }
]
Order Collection:
[
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b501f" },
"Id": "1",
"OrderDate": "2012-07-04 00:00:00.000",
"OrderNumber": "542378",
"CustomerId": "85",
"TotalAmount": "440.00" },
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b5020" },
"Id": "2",
"OrderDate": "2012-07-05 00:00:00.000",
"OrderNumber": "542379",
"CustomerId": "79",
"TotalAmount": "1863.40" }
]
I have spent all day looking at YouTube videos and MongoDB documentation but I am failing to comprehend a few things. One, at the time I do a $group function I lose all the fields not associated with the group and I would like to keep a few fields. I would like to have it returned the name of the customer with the highest order.
The pipeline I was using that gets me part of the way is the following:
[{
$lookup: {
from: 'Customer',
localField: 'CustomerId',
foreignField: 'Id',
as: 'CustomerInfo'
}}, {
$project: {
CustomerId: 1,
CustomerInfo: 1
}}, {
$group: {
_id: '$CustomerInfo.Id',
CustomerOrderNumber: {
$sum: 1
}
}}, {
$sort: {
CustomerOrderNumber: -1
}}]
Example data this returns in order:
Apologies for the bad formatting, still trying to get the hang of posting questions that are easy to understand and useful.
In $group stage, it only returns documents with _id and CustomerOrderNumber fields, so CustomerInfo field was missing.
$lookup
$project - From 1st stage, CustomerInfo returns as an array, hence getting the first document as a document field instead of an array field.
$group - Group by CustomerId, sum the documents as CustomerOrderNumber, and take the first document as CustomerInfo.
$project - Decorate the output documents.
$setWindowsFields - With $denseRank to rank the document position by CustomerOrderNumber (DESC). If there are documents with same CustomerOrderNumber, the ranking will treat them as same rank/position.
$match - Select documents with denseRankHighestOrder is 1 (highest).
db.Order.aggregate([
{
$lookup: {
from: "Customer",
localField: "CustomerId",
foreignField: "Id",
as: "CustomerInfo"
}
},
{
$project: {
CustomerId: 1,
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$group: {
_id: "$CustomerInfo.Id",
CustomerOrderNumber: {
$sum: 1
},
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$project: {
_id: 0,
CustomerId: "$_id",
CustomerOrderNumber: 1,
CustomerName: {
$concat: [
"$CustomerInfo.FirstName",
" ",
"$CustomerInfo.LastName"
]
}
}
},
{
$setWindowFields: {
sortBy: {
CustomerOrderNumber: -1
},
output: {
denseRankHighestOrder: {
$denseRank: {}
}
}
}
},
{
$match: {
denseRankHighestOrder: 1
}
}
])
Sample Mongo Playground
Note:
$sort stage able to sort the document by CustomerOrderNumber. But if you try to limit the documents such as "SELECT TOP n", the output result may be incorrect when there are multiple documents with the same CustomerOrderNumber/rank.
Example: SELECT TOP 1 Customer who has the highest CustomerOrderNumber but there are 3 customers who have the highest CustomerOrderNumber.

How to access nested array of objects in mongodb aggregation pipeline?

I have a document like this(this is the result after few pipeline stages)
[
{
"_id": ObjectId("5e9d5785e4c8343bb2b455cc"),
"name": "Jenny Adams",
"report": [
{ "category":"Beauty", "status":"submitted", "submitted_on": [{"_id": "xyz", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "abc", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Kitchen", "status":"submitted", "submitted_on": [{"_id": "mnp", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
},
{
"_id": ObjectId("5e9d5785e4c8343bb2b455db"),
"name": "Mathew Smith",
"report": [
{ "category":"Household", "status":"submitted", "submitted_on": [{"_id": "123", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "345", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Garden", "status":"submitted", "submitted_on": [{"_id": "567", "timestamp":"2022-05-08T06:10:06.432+00:00"}] },
{ "category":"BakingNeeds", "status":"submitted", "submitted_on": [{"_id": "891", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
}
]
I have user input for time period -
from - 2021-02-23T06:10:05.832+00:00
to - 2022-02-23T06:10:05.832+00:00
Now I wanted to filter the objects from the report which lie in a certain range of time, I want to only keep the object if the "submitted_on[-1]["timestamp"]" is in range of from and to date timestamp.
I am struggling with accessing the timestamp because of the nesting
I tried this
$project: {
"name": 1,
"report": {
"category": 1,
"status": 1,
"submitted_on": 1,
"timestamp": {
$arrayElemAt: ["$report.cataloger_submitted_on", -1]
}
}
}
But this gets the last object of the report array {"_id": "bcd", "timestamp":"2022-05-08T06:10:06.432+00:00"} for all the items inside the report. How can I do this to select the last timestamp of each obj.
You can replace your phase in the aggregation pipeline with two phases: $unwind and $addFields in order to get what I think you want:
{
$unwind: "$report"
},
{
"$addFields": {
"timestamp": {
$arrayElemAt: [
"$report.submitted_on",
-1
]
}
}
},
The $unwind phase is breaking the external array into documents since you want to perform an action on each one of them. See the playground here with your example. If you plan to continue the aggregation pipeline with more steps, you can probably skip the $addFields phase and include the condition inside your next $match phase.

MongoDB: List the usernames and the amount of replies they received

I have a collection set up in mongoDB with sample comments made by users from a made up social media platform, in this form:
{
"_id" : ObjectId("5aa58936c4214f42f4c666b8"),
"id" : "85",
"user_name": "Alex4Ever",
"text" : "This is a comment",
"in_reply_to_user_name": "SamLad"
},
{
"_id" : ObjectId("5aa58935c4214f42f4c66608"),
"id" : "86",
"user_name": "SamLad",
"text" : "I am inevitable",
"in_reply_to_user_name": null
},
{
"_id" : ObjectId("5aa588e4c4214f42f4c63caa"),
"id" : "87",
"user_name": "HewwoKitty",
"text" : "What is grief, if not love persevering?",
"in_reply_to_user_name": "Alex4Ever"
} //There are more, but for testing purposes, I only use these 3 for now.
I have to come up with a query in MongoDB to list all the users in the file along with the amount of replies they received. So in the above sample bit of file, the output should be like:
"_id": "Alex4Ever", "replyCount" : 1, //HewwoKitty replied to Alex4Ever
"_id": "SamLad", "replyCount" : 1, //Alex4Ever replied to SamLad
"_id": "HewwoKitty", "replyCount" : 0, //No one replied to HewwoKitty
My attempt at doing this:
db.comments.aggregate([
{$match:{"in_reply_to_user_name":{"$exists":true, "$ne":null}}},
{$group: { _id: "$in_reply_to_user_name", replyCount:{$sum: 1}}},
{$sort:{replyCount: -1}}
]).pretty()
However, I only get the non-zero values, i.e. I do not get HewwoKitty with a replyCount of 0. Is there any way to print all 3 lines, including the lines with 0 replies?
Demo - https://mongoplayground.net/p/JA9YasEYuVV
Use $lookup and create self join to get all replies for a user and use $size to get the count of replies, after that $group them on user_name.
Extract the replyCount, take $first value from the group
db.collection.aggregate([
{
"$match": {
"in_reply_to_user_name": { "$exists": true }
}
},
{
"$lookup": {
"from": "collection",
"localField": "user_name",
"foreignField": "in_reply_to_user_name",
"as": "replies"
}
},
{
"$project": {
"user_name": 1,
"replyCount": { "$size": "$replies" }
}
},
{
"$group": {
"_id": "$user_name",
"replyCount": { "$first": "$replyCount" }
}
}
])

MongoDB query to find document with duplicate value in array

tldr; I'm struggling to construct a query to
Make an aggregation to get a count of values on a certain key ("original_text_source"), which
Is in a sub-document that is in an array
Full description
I have embedded documents with arrays that are structured like this:
{
"_id" : ObjectId("0123456789"),
"type" : "some_object",
"relationships" : {
"x" : [ ObjectId("0123456789") ],
"y" : [ ObjectId("0123456789") ],
},
"properties" : [
{
"a" : "1"
},
{
"b" : "1"
},
{
"original_text_source" : "foo.txt"
},
]
}
The docs were created from exactly 10k text files, sorted in various folders. During inserting documents into the MongoDB (in batches) I messed up and moved a few files around, causing one file to be imported twice (my database has a count of exactly 10001 docs), but obviously I don't know which one it is. Since one of the "original_text_source" values has to have a count of 2, I was planning on just deleting one.
I read up on solutions with $elemMatch, but since my array element is a document, I'm not sure how to proceed. Maybe with mapReduce? But I can't transfer the logic to my doc structure.
I also could just create a new collection and reupload all, but in case I mess up again, I'd rather like to learn how to query for duplicates. It seems more elegant :-)
You can find duplicates with a simple aggregation like this:
db.collection.aggregate(
{ $group: { _id: "$properties.original_text_source", docIds: { $push: "$_id" }, docCount: { $sum: 1 } } },
{ $match: { "docCount": { $gt: 1 } } }
)
which gives you something like this:
{
"_id" : [
"foo.txt"
],
"docIds" : [
ObjectId("59d6323613940a78ba1d5ffa"),
ObjectId("59d6324213940a78ba1d5ffc")
],
"docCount" : 2.0
}
Run the following:
db.collection.aggregate([
{ $group: {
_id: { name: "$properties.original_text_source" },
idsForDuplicatedDocs: { $addToSet: "$_id" },
count: { $sum: 1 }
} },
{ $match: {
count: { $gte: 2 }
} },
{ $sort : { count : -1} }
]);
Given a collection which contains two copies of the document you showed in your question, the above command will return:
{
"_id" : {
"name" : [
"foo.txt"
]
},
"idsForDuplicatedDocs" : [
ObjectId("59d631d2c26584cd8b7b3337"),
ObjectId("59d631cbc26584cd8b7b3333")
],
"count" : 2
}
Where ...
The attribute _id.name is the value of the duplicated properties.original_text_source
The attribute idsForDuplicatedDocs contains the _id values for each of the documents which have a duplicated properties.original_text_source
"reviewAndRating": [
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7517",
"_id": "5bd9745e2d66162a6dd1f0ef",
"rating": "5"
},
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7518",
"_id": "5bd974612d66162a6dd1f0f0",
"rating": "5"
},
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7517",
"_id": "5bd974622d66162a6dd1f0f1",
"rating": "5"
}
]

Resources