I have a collection in a MongoDB with a document like below:
{
"_id": "63269e0f85bfd011e989d0f7",
"name": "Aravind Krishna",
"mobile": 7309454620,
"email": "akaravindk59#gmail.com",
"password": "$2b$12$0dOE/0wj6uX604h3DZpGxuO/L.fZg7KCm7mOGsNMkarSaeG2C/Wvq",
"orders": [
{
"paymentIntentId": "pi_3LjFDtSHVloG65Ul0exLkzsO",
"cart": [array],
"amount": 3007,
"created": 1663475717,
"_id": "6326a01344f26617fc1a65d6"
},
{
"paymentIntentId": "pi_3LjFFUSHVloG65Ul1FQHlZ9H",
"cart": [array],
"amount": 389,
"created": 1663475816,
"_id": "6326a07744f26617fc1a65d8"
}
],
"__v": 0
}
I wanted to get only the orders array sorted by the created property in both ascending and descending manner. As we can see here that orders field inside this document is an array of objects. Please give a solution if you have one. I tried the sortArray method but it is giving an error. Please help.
For some assistance the output should look something like this:
{
"_id": "63269e0f85bfd011e989d0f7",
"orders": [
{
"paymentIntentId": "pi_3LjFDtSHVloG65Ul0exLkzsO",
"cart": [array],
"amount": 3007,
"created": 1663475717,
"_id": "6326a01344f26617fc1a65d6"
},
{
"paymentIntentId": "pi_3LjFFUSHVloG65Ul1FQHlZ9H",
"cart": [array],
"amount": 389,
"created": 1663475816,
"_id": "6326a07744f26617fc1a65d8"
}
]
}
See, I got only the orders field but I want it sorted by the created property value both ascending and descending.
As you mentioned the best way to achieve this is to use $sortArray, I'm assuming the error you're getting is a version mismatch as this operator was only recently added at version 5.2.
The other way to achieve the same result as not as pleasant, you need to $unwind the array, $sort the results and then $group to restore structure, then it's easy to add the "other" order by using $reverseArray although I recommend instead of duplicating your data you just handle the "reverse" requirement in code, overall the pipeline will look like so:
db.collection.aggregate([
{
$unwind: "$orders"
},
{
$sort: {
"orders.created": 1
}
},
{
$group: {
_id: "$_id",
asc_orders: {
$push: "$orders"
}
}
},
{
$addFields: {
desc_orders: {
"$reverseArray": "$asc_orders"
}
}
}
])
Mongo Playground
Related
I only have 2 years exp with SQL databases and 0 with NoSQL database. I am trying to write a pipeline using MongoDB Compass aggregate pipeline tool that performs a lookup, group, sum, and sort. I am using MongoDB compass to try and accomplish this. Also, please share any resources that make learning this easier, I've not had much like finding good and easy-to-understand examples online with using the compass to accomplish these tasks. Thank you.
An example question I am trying to solve is:
What customer placed the highest number of orders?
Example Data is:
Customer Collection:
[
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f53" },
"Id": "1",
"FirstName": "Maria",
"LastName": "Anders",
"City": "Berlin",
"Country": "Germany",
"Phone": "030-0074321"},
{ "_id": { "$oid": "6276ba2dd1dfd6f5bf4b4f54" },
"Id": "2",
"FirstName": "Ana",
"LastName": "Trujillo",
"City": "México D.F.",
"Country": "Mexico",
"Phone": "(5) 555-4729" }
]
Order Collection:
[
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b501f" },
"Id": "1",
"OrderDate": "2012-07-04 00:00:00.000",
"OrderNumber": "542378",
"CustomerId": "85",
"TotalAmount": "440.00" },
{ "_id": { "$oid": "6276ba9dd1dfd6f5bf4b5020" },
"Id": "2",
"OrderDate": "2012-07-05 00:00:00.000",
"OrderNumber": "542379",
"CustomerId": "79",
"TotalAmount": "1863.40" }
]
I have spent all day looking at YouTube videos and MongoDB documentation but I am failing to comprehend a few things. One, at the time I do a $group function I lose all the fields not associated with the group and I would like to keep a few fields. I would like to have it returned the name of the customer with the highest order.
The pipeline I was using that gets me part of the way is the following:
[{
$lookup: {
from: 'Customer',
localField: 'CustomerId',
foreignField: 'Id',
as: 'CustomerInfo'
}}, {
$project: {
CustomerId: 1,
CustomerInfo: 1
}}, {
$group: {
_id: '$CustomerInfo.Id',
CustomerOrderNumber: {
$sum: 1
}
}}, {
$sort: {
CustomerOrderNumber: -1
}}]
Example data this returns in order:
Apologies for the bad formatting, still trying to get the hang of posting questions that are easy to understand and useful.
In $group stage, it only returns documents with _id and CustomerOrderNumber fields, so CustomerInfo field was missing.
$lookup
$project - From 1st stage, CustomerInfo returns as an array, hence getting the first document as a document field instead of an array field.
$group - Group by CustomerId, sum the documents as CustomerOrderNumber, and take the first document as CustomerInfo.
$project - Decorate the output documents.
$setWindowsFields - With $denseRank to rank the document position by CustomerOrderNumber (DESC). If there are documents with same CustomerOrderNumber, the ranking will treat them as same rank/position.
$match - Select documents with denseRankHighestOrder is 1 (highest).
db.Order.aggregate([
{
$lookup: {
from: "Customer",
localField: "CustomerId",
foreignField: "Id",
as: "CustomerInfo"
}
},
{
$project: {
CustomerId: 1,
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$group: {
_id: "$CustomerInfo.Id",
CustomerOrderNumber: {
$sum: 1
},
CustomerInfo: {
$first: "$CustomerInfo"
}
}
},
{
$project: {
_id: 0,
CustomerId: "$_id",
CustomerOrderNumber: 1,
CustomerName: {
$concat: [
"$CustomerInfo.FirstName",
" ",
"$CustomerInfo.LastName"
]
}
}
},
{
$setWindowFields: {
sortBy: {
CustomerOrderNumber: -1
},
output: {
denseRankHighestOrder: {
$denseRank: {}
}
}
}
},
{
$match: {
denseRankHighestOrder: 1
}
}
])
Sample Mongo Playground
Note:
$sort stage able to sort the document by CustomerOrderNumber. But if you try to limit the documents such as "SELECT TOP n", the output result may be incorrect when there are multiple documents with the same CustomerOrderNumber/rank.
Example: SELECT TOP 1 Customer who has the highest CustomerOrderNumber but there are 3 customers who have the highest CustomerOrderNumber.
I have a document like this(this is the result after few pipeline stages)
[
{
"_id": ObjectId("5e9d5785e4c8343bb2b455cc"),
"name": "Jenny Adams",
"report": [
{ "category":"Beauty", "status":"submitted", "submitted_on": [{"_id": "xyz", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "abc", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Kitchen", "status":"submitted", "submitted_on": [{"_id": "mnp", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
},
{
"_id": ObjectId("5e9d5785e4c8343bb2b455db"),
"name": "Mathew Smith",
"report": [
{ "category":"Household", "status":"submitted", "submitted_on": [{"_id": "123", "timestamp":"2022-02-23T06:10:05.832+00:00"}, {"_id": "345", "timestamp":"2021-03-23T06:10:05.832+00:00"}] },
{ "category":"Garden", "status":"submitted", "submitted_on": [{"_id": "567", "timestamp":"2022-05-08T06:10:06.432+00:00"}] },
{ "category":"BakingNeeds", "status":"submitted", "submitted_on": [{"_id": "891", "timestamp":"2022-05-08T06:10:06.432+00:00"}] }
]
}
]
I have user input for time period -
from - 2021-02-23T06:10:05.832+00:00
to - 2022-02-23T06:10:05.832+00:00
Now I wanted to filter the objects from the report which lie in a certain range of time, I want to only keep the object if the "submitted_on[-1]["timestamp"]" is in range of from and to date timestamp.
I am struggling with accessing the timestamp because of the nesting
I tried this
$project: {
"name": 1,
"report": {
"category": 1,
"status": 1,
"submitted_on": 1,
"timestamp": {
$arrayElemAt: ["$report.cataloger_submitted_on", -1]
}
}
}
But this gets the last object of the report array {"_id": "bcd", "timestamp":"2022-05-08T06:10:06.432+00:00"} for all the items inside the report. How can I do this to select the last timestamp of each obj.
You can replace your phase in the aggregation pipeline with two phases: $unwind and $addFields in order to get what I think you want:
{
$unwind: "$report"
},
{
"$addFields": {
"timestamp": {
$arrayElemAt: [
"$report.submitted_on",
-1
]
}
}
},
The $unwind phase is breaking the external array into documents since you want to perform an action on each one of them. See the playground here with your example. If you plan to continue the aggregation pipeline with more steps, you can probably skip the $addFields phase and include the condition inside your next $match phase.
I have a collection Group like this:
{
"_id" : ObjectId("5822dd5cb6a69ca404e0d93c"),
"name" : "GROUP 1",
"member": [
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93d")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("5822dd5cb6a69ca404e0d93f")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("573d5ff8d1b7b3b32e165599"),
"name" : "GROUP 2",
"member": [
{
"_id": ObjectId("574802e031e70b503eabe195")
"user": ObjectId("573ac820eb3ed3ea156905f6"),
"task": ObjectId("5775f1a74b41037e246a51d1"),
},
{
"_id": ObjectId("574802e031e70b503eabe198")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("576cfa042c0a4054794dd242"),
}
],
curTask: {
"_id": ObjectId("577249a2f9dba0c750ef705b"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
{
"_id" : ObjectId("574802e031e70b503eabe194"),
"name" : "GROUP 3",
"member": [
{
"_id": ObjectId("574be0a2bf16234f5a752f83")
"user": ObjectId("573ac79beb3ed3ea156905f4"),
"task": ObjectId("5822ddecb6a69ca404e0d942"),
},
{
"_id": ObjectId("574d397d6e9f07d64d1e4e40")
"user": ObjectId("57762fce5ece6a5d04457bf9"),
"task": ObjectId("5822ddecb6a69ca404e0d943"),
}
],
curTask: {
"_id": ObjectId("5822ddecb6a69ca404e0d942"),
"time": ISODate("2016-01-01T01:01:01.000Z")
}
}
And I want to be able to find all group where user with objectId 573ac820eb3ed3ea156905f6 (1st user in group 1) do not do the same task as currentTask. So far I've wrote this query:
db.getCollection('groups').find({"member":{ "$elemMatch": {"user": ObjectId("573ac820eb3ed3ea156905f6")
, "task": { "$ne":"this.curTask._id"}}}})
But this didn't seem to work as it still return the group where user 573ac820eb3ed3ea156905f6 having his task === curTask._id. The first half of elemMatch seem to work fine (only find group with user with objectid 573ac820eb3ed3ea156905f6 in member, the query only return group 1 and 2 since group 3 don't have that user.) but I cant seem to make mongodb compare a field in the object of the array with another field of the document. Anyone have any idea how do I make this comparison?
There are two solutions to the problem -
First - Using $where. By using $where you can use Javascript code inside mongodb queries. Makes the code flexible, but the shortcoming is that it runs slow since Javascript code has to run rather than more optimized mongoDB C++ code.
db.getCollection('groups').find({
$where: function () {
var flag = 0;
for(var i=0; i<obj.member.length;i++) {
if(obj.member[i].user.str == ObjectId("573ac820eb3ed3ea156905f6").str && obj.member[i].task.str != obj.curTask._id.str ){flag = 1; break;}
}
return flag;
}
})
Second - Using an aggregation pipeline. Here I am unwinding the array, doing matches as described, and finally recreating the array as it was needed. If the not matching elements in the member array are not needed, one can omit the last grouping part.
[
{$match: {'member.user': ObjectId("573ac820eb3ed3ea156905f6")}},
{$unwind: '$member'},
{$project: {
name: 1,
member: 1,
curTask: 1,
ne: {$and: [{$ne: ['$member.task', '$curTask._id']}, {$eq: ['$member.user', ObjectId("573ac820eb3ed3ea156905f6")]}]}
}},
{$group: {
_id: '$_id',
member: {$push: '$member'},
curTask: {$first: '$curTask'},
name: {$first: '$name'},
check: {$sum: {$cond: ['$ne', 1, 0]}}
}},
{$match: {check: {$gt: 0}}}
]
In my data, I have two fields that I want to use as an index together. They are sensorid (any string) and timestamp (yyyy-mm-dd hh:mm:ss).
So I made an index for these two using the Cloudant index generator. This was created successfully and it appears as a design document.
{
"index": {
"fields": [
{
"name": "sensorid",
"type": "string"
},
{
"name": "timestamp",
"type": "string"
}
]
},
"type": "text"
}
However, when I try to make the following query to find all documents with a timestamp newer than some value, I am told there is no index available for the selector:
{
"selector": {
"timestamp": {
"$gt": "2015-10-13 16:00:00"
}
},
"fields": [
"_id",
"_rev"
],
"sort": [
{
"_id": "asc"
}
]
}
What have I done wrong?
It seems to me like cloudant query only allows sorting on fields that are part of the selector.
Therefore your selector should include the _id field and look like:
"selector":{
"_id":{
"$gt":0
},
"timestamp":{
"$gt":"2015-10-13 16:00:00"
}
}
I hope this works for you!
I'm trying to figure out what the best schema is for a dating site like app. User's have a listing (possibly many) and they can view other user listings to 'like' and 'dislike' them.
Currently i'm just storing the other persons listing id in a likedBy and dislikedBy array. When a user 'likes' a listing, it puts their listing id into the 'liked' listings arrays. However I would now like to track the timestamp that a user likes a listing. This would be used for a user's 'history list' or for data analysis.
I would need to do two separate queries:
find all active listings that this user has not liked or disliked before
and for a user's history of 'liked'/'disliked' choices
find all the listings user X has liked in chronological order
My current schema is:
listings
_id: 'sdf3f'
likedBy: ['12ac', 'as3vd', 'sadf3']
dislikedBy: ['asdf', 'sdsdf', 'asdfas']
active: bool
Could I do something like this?
listings
_id: 'sdf3f'
likedBy: [{'12ac', date: Date}, {'ds3d', date: Date}]
dislikedBy: [{'s12ac', date: Date}, {'6fs3d', date: Date}]
active: bool
I was also thinking of making a new collection for choices.
choices
Id
userId // id of current user making the choice
userlistId // listing of the user making the choice
listingChoseId // the listing they chose yes/no
type
date
I'm not sure of the performance implications of having these choices in another collection when doing the find all active listings that this user has not liked or disliked before.
Any insight would be greatly appreciated!
Well you obviously thought it was a good idea to have these embedded in the "listings" documents so your additional usage patterns to the cases presented here worked properly. With that in mind there is no reason to throw that away.
To clarify though, the structure you seem to want is something like this:
{
"_id": "sdf3f",
"likedBy": [
{ "userId": "12ac", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "as3vd", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "sadf3", "date": ISODate("2014-04-09T07:30:47.091Z") }
],
"dislikedBy": [
{ "userId": "asdf", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "sdsdf", "date": ISODate("2014-04-09T07:30:47.091Z") },
{ "userId": "asdfas", "date": ISODate("2014-04-09T07:30:47.091Z") }
],
"active": true
}
Which is all well and fine except that there is one catch. Because you have this content in two array fields you would not be able to create an index over both of those fields. That is a restriction where only one array type of field (or multikey) can be be included within a compound index.
So to solve the obvious problem with your first query not being able to use an index, you would structure like this instead:
{
"_id": "sdf3f",
"votes": [
{
"userId": "12ac",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "as3vd",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "sadf3",
"type": "like",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "asdf",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "sdsdf",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
},
{
"userId": "asdfas",
"type": "dislike",
"date": ISODate("2014-04-09T07:30:47.091Z")
}
],
"active": true
}
This allows an index that covers this form:
db.post.ensureIndex({
"active": 1,
"votes.userId": 1,
"votes.date": 1,
"votes.type": 1
})
Actually you will probably want a few indexes to suit your usage patterns, but the point is now can have indexes you can use.
Covering the first case you have this form of query:
db.post.find({ "active": true, "votes.userId": { "$ne": "12ac" } })
That makes sense considering that you clearly are not going to have both an like and dislike option for each user. By the order of that index, at least active can be used to filter because your negating condition needs to scan everything else. No way around that with any structure.
For the other case you probably want the userId to be in an index before the date and as the first element. Then your query is quite simple:
db.post.find({ "votes.userId": "12ac" })
.sort({ "votes.userId": 1, "votes.date": 1 })
But you may be wondering that you suddenly lost something in that getting the count of "likes" and "dislikes" was as easy as testing the size of the array before, but now it's a little different. Not a problem that cannot be solved using aggregate:
db.post.aggregate([
{ "$unwind": "$votes" },
{ "$group": {
"_id": {
"_id": "$_id",
"active": "$active"
},
"likes": { "$sum": { "$cond": [
{ "$eq": [ "$votes.type", "like" ] },
1,
0
]}},
"dislikes": { "$sum": { "$cond": [
{ "$eq": [ "$votes.type", "dislike" ] },
1,
0
]}}
])
So whatever your actual usage form you can store any important parts of the document to keep in the grouping _id and then evaluate the count of "likes" and "dislikes" in an easy manner.
You may also not that changing an entry from like to dislike can also be done in a single atomic update.
There is much more you can do, but I would prefer this structure for the reasons as given.