Mongodb: Get top documents - database

I am unable to find an answer to a problem either on Google or SO.
I have a collection such as :
{ "_id" : "0", "timestamp" : 160000 }
{ "_id" : "00", "timestamp" : 160000 }
{ "_id" : "000", "timestamp" : 150000 }
And I want to get the top rows based on timestamp, not only the top one.
This for example:
{ "_id" : "0", "timestamp" : 160000 }
{ "_id" : "00", "timestamp" : 160000 }
The obvious solution would be to sort DESC and get the first n rows, but this doesn't actually do what is required, I would need to know the number of top elements etc.
I'd like to get the timestamp of the top row and then match all rows that have that timestamp, or perhaps something else?
Thank you in advance!

You have to make use of self lookup to perform lookup on the same collection after finding the max value.
db.collection.aggregate([
{
"$sort": {
"timestamp": -1
}
},
{
"$limit": 1
},
{
"$lookup": {
"from": "collection",
"localField": "timestamp",
"foreignField": "timestamp",
"as": "topTimeStamp"
}
},
{
"$project": {
"_id": 0,
"result": "$topTimeStamp"
}
},
])
Mongo Sample Execution
Sort timestamp key in descending order to improve query performance.
If the number of documents in the collection will be lesser, I recommend you to replace $sort and $limit stages with $group stage and find the max value using $max accumulator.

Related

MongoDB: List the usernames and the amount of replies they received

I have a collection set up in mongoDB with sample comments made by users from a made up social media platform, in this form:
{
"_id" : ObjectId("5aa58936c4214f42f4c666b8"),
"id" : "85",
"user_name": "Alex4Ever",
"text" : "This is a comment",
"in_reply_to_user_name": "SamLad"
},
{
"_id" : ObjectId("5aa58935c4214f42f4c66608"),
"id" : "86",
"user_name": "SamLad",
"text" : "I am inevitable",
"in_reply_to_user_name": null
},
{
"_id" : ObjectId("5aa588e4c4214f42f4c63caa"),
"id" : "87",
"user_name": "HewwoKitty",
"text" : "What is grief, if not love persevering?",
"in_reply_to_user_name": "Alex4Ever"
} //There are more, but for testing purposes, I only use these 3 for now.
I have to come up with a query in MongoDB to list all the users in the file along with the amount of replies they received. So in the above sample bit of file, the output should be like:
"_id": "Alex4Ever", "replyCount" : 1, //HewwoKitty replied to Alex4Ever
"_id": "SamLad", "replyCount" : 1, //Alex4Ever replied to SamLad
"_id": "HewwoKitty", "replyCount" : 0, //No one replied to HewwoKitty
My attempt at doing this:
db.comments.aggregate([
{$match:{"in_reply_to_user_name":{"$exists":true, "$ne":null}}},
{$group: { _id: "$in_reply_to_user_name", replyCount:{$sum: 1}}},
{$sort:{replyCount: -1}}
]).pretty()
However, I only get the non-zero values, i.e. I do not get HewwoKitty with a replyCount of 0. Is there any way to print all 3 lines, including the lines with 0 replies?
Demo - https://mongoplayground.net/p/JA9YasEYuVV
Use $lookup and create self join to get all replies for a user and use $size to get the count of replies, after that $group them on user_name.
Extract the replyCount, take $first value from the group
db.collection.aggregate([
{
"$match": {
"in_reply_to_user_name": { "$exists": true }
}
},
{
"$lookup": {
"from": "collection",
"localField": "user_name",
"foreignField": "in_reply_to_user_name",
"as": "replies"
}
},
{
"$project": {
"user_name": 1,
"replyCount": { "$size": "$replies" }
}
},
{
"$group": {
"_id": "$user_name",
"replyCount": { "$first": "$replyCount" }
}
}
])

How to update an embedded document into a nested array?

I have this kind of structure into a Mongo collection :
{
"_id": "12345678",
"Invoices": [
{
"_id": "123456789",
"Currency": "EUR",
"DueTotalAmountInvoice": 768.3699999999999,
"InvoiceDate": "2016-01-01 00:00:00.000",
"Items": [
{
"Item": 10,
"ProductCode": "ABC567",
"Quantity": 1
},
{
"Item": 20,
"ProductCode": "CDE987",
"Quantity": 1
}
]
},
{
"_id": "87654321",
"Currency": "EUR",
"DueTotalAmountInvoice": 768.3699999999999,
"InvoiceDate": "2016-01-01 00:00:00.000",
"Items": [
{
"Item": 30,
"ProductCode": "PLO987",
"Quantity": 1,
"Units": "KM3"
},
{
"Item": 40,
"ProductCode": "PLS567",
"Quantity": 1,
"DueTotalAmountInvoice": 768.3699999999999
}
]
}
]
}
So I have a first object storing several Invoices and each Invoice is storing several Items. An item is an embedded document.
So in relational modelisation :
A customer has 1 or several Invoice
An Invoice has 1 or several Item
I am facing an issue since I am trying to update a specific Item into a specific a specific Invoice. For example I want to change the quantity of the item 10 in Invoice 123456789.
How is it possible to do that in Mongodb ?
I tried :
Push statement but it doesn't seem to work for nested arrays
arrayFilters but it doesn't seem to work for embedded document in nested arrays (only simple value arrays).
Can you give me some advice about it ?
Thank you !
As per your problem description here:
For example I want to change the quantity of the item 10 in Invoice 123456789. I just changed the Quantity to 3. You can perform any operations here as you want. You just need to take note of how I used arrayFilters here.
Try this query:
db.collection.update(
{"_id" : "12345678"},
{$set:{"Invoices.$[element1].Items.$[element2].Quantity":3}},
{multi:true, arrayFilters:[ {"element1._id": "123456789"},{
"element2.Item": { $eq: 10 }} ]}
)
The above query successfully executed from mongo shell (Mongo 3.6.3). And I see this result:
/* 1 */
{
"_id" : "12345678",
"Invoices" : [
{
"_id" : "123456789",
"Currency" : "EUR",
"DueTotalAmountInvoice" : 768.37,
"InvoiceDate" : "2016-01-01 00:00:00.000",
"Items" : [
{
"Item" : 10,
"ProductCode" : "ABC567",
"Quantity" : 3.0
},
{
"Item" : 20,
"ProductCode" : "CDE987",
"Quantity" : 1
}
]
},
{
"_id" : "87654321",
"Currency" : "EUR",
"DueTotalAmountInvoice" : 768.37,
"InvoiceDate" : "2016-01-01 00:00:00.000",
"Items" : [
{
"Item" : 30,
"ProductCode" : "PLO987",
"Quantity" : 1,
"Units" : "KM3"
},
{
"Item" : 40,
"ProductCode" : "PLS567",
"Quantity" : 1,
"DueTotalAmountInvoice" : 768.37
}
]
}
]
}
Is that what you wanted?
Mongo Db has a way to get the specific array element by using its index. For example, you have an array and you need to get [your] index, then in mongo we use dot . but not braces [ ] !! And one thing is important either! - If you are getting the embedded value (in object or array) you must use " " for your way so if you are changing your value inside this must be like that:
yourModel.findOneAndUpdate(
{ _id: "12345678" },
{
$set: {
"Invoices.0.Items.0.Quantity": 10,
},
}
);
0 - is your element indexes in the array!
$set is the operator to set new value
10 - new value
Else you can go further, you can construct your way to the value with the variable indexes. Use string template
yourModel.findOneAndUpdate(
{ _id: "12345678" },
{
$set: {
[`Invoices.${invoiceIndex}.Items.${itemIndex}.Quantity`]:newValue ,
},
}
);
it is the same but you can paste variable indexes

MongoDB query to find document with duplicate value in array

tldr; I'm struggling to construct a query to
Make an aggregation to get a count of values on a certain key ("original_text_source"), which
Is in a sub-document that is in an array
Full description
I have embedded documents with arrays that are structured like this:
{
"_id" : ObjectId("0123456789"),
"type" : "some_object",
"relationships" : {
"x" : [ ObjectId("0123456789") ],
"y" : [ ObjectId("0123456789") ],
},
"properties" : [
{
"a" : "1"
},
{
"b" : "1"
},
{
"original_text_source" : "foo.txt"
},
]
}
The docs were created from exactly 10k text files, sorted in various folders. During inserting documents into the MongoDB (in batches) I messed up and moved a few files around, causing one file to be imported twice (my database has a count of exactly 10001 docs), but obviously I don't know which one it is. Since one of the "original_text_source" values has to have a count of 2, I was planning on just deleting one.
I read up on solutions with $elemMatch, but since my array element is a document, I'm not sure how to proceed. Maybe with mapReduce? But I can't transfer the logic to my doc structure.
I also could just create a new collection and reupload all, but in case I mess up again, I'd rather like to learn how to query for duplicates. It seems more elegant :-)
You can find duplicates with a simple aggregation like this:
db.collection.aggregate(
{ $group: { _id: "$properties.original_text_source", docIds: { $push: "$_id" }, docCount: { $sum: 1 } } },
{ $match: { "docCount": { $gt: 1 } } }
)
which gives you something like this:
{
"_id" : [
"foo.txt"
],
"docIds" : [
ObjectId("59d6323613940a78ba1d5ffa"),
ObjectId("59d6324213940a78ba1d5ffc")
],
"docCount" : 2.0
}
Run the following:
db.collection.aggregate([
{ $group: {
_id: { name: "$properties.original_text_source" },
idsForDuplicatedDocs: { $addToSet: "$_id" },
count: { $sum: 1 }
} },
{ $match: {
count: { $gte: 2 }
} },
{ $sort : { count : -1} }
]);
Given a collection which contains two copies of the document you showed in your question, the above command will return:
{
"_id" : {
"name" : [
"foo.txt"
]
},
"idsForDuplicatedDocs" : [
ObjectId("59d631d2c26584cd8b7b3337"),
ObjectId("59d631cbc26584cd8b7b3333")
],
"count" : 2
}
Where ...
The attribute _id.name is the value of the duplicated properties.original_text_source
The attribute idsForDuplicatedDocs contains the _id values for each of the documents which have a duplicated properties.original_text_source
"reviewAndRating": [
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7517",
"_id": "5bd9745e2d66162a6dd1f0ef",
"rating": "5"
},
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7518",
"_id": "5bd974612d66162a6dd1f0f0",
"rating": "5"
},
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7517",
"_id": "5bd974622d66162a6dd1f0f1",
"rating": "5"
}
]

Mongo aggregate $slice to object instead of array

I have upgraded Mongo to 3.2 and am delighted that aggregating using $slice works. However, my problem is that I want to assign the value to an object not an array. I cannot find how to do this.
My script applied:
db.std_sourceBusinessData.aggregate(
{ $match : {objectType: "Account Balances"}},
{ $project: {_id: 1,entity_id: 1,accountBalances: 1}},
{ $unwind: "$accountBalances" },
{ $match: {"accountBalances": "Sales"}},
{$project: {
_id :1,
"Value" : {$slice: ["$accountBalances",1,1]},
"key" : {$literal: "sales"},
"company": "$entity_id"
}}
)
Comes back with:
{
"_id" : ObjectId("566f3da3d58419e8b0fc76c7"),
"Value" : [
"5428.64"
],
"key" : "sales"
}
Notice that Value is an array. What I want is:
{
"_id" : ObjectId("566f3da3d58419e8b0fc76c7"),
"Value" : "5428.64",
"key" : "sales"
}
Thanks, Matt
You can use $arrayElemAt instead of $slice to directly get a single array element.
Modify your final $project stage to be:
{$project: {
_id: 1,
"Value": {$arrayElemAt: ["$accountBalances", 1]},
"key": {$literal: "sales"},
"company": "$entity_id"
}}

MongoDB get count of particular key in an array

In mongoDB, how can we get the count of particular key in an array
{
"_id" : ObjectId("52d9212608a224e99676d378"),
"business" : [
{
"name" : "abc",
"rating" : 4.5
},
{
"name" : "pqr"
},
{
"name" : "xyz",
"rating" : 3.6
}
]
}
in the above example, business is an array (with "name" and/or "rating" keys)
How can i get the count of business array with only "rating" key existing ?
Expected output is : 2
Looks like you have to use Aggregation Framework. In particular you need to $unwind your array, then match only elements with rating field included, then $group documents back to original format.
Try something like this:
db.test.aggregate([
{ $match: { /* your query criteria document */ } },
{ $unwind: "$business" },
{ $match: {
"business.rating": { $exists: 1 }
}
},
{ $group: {
_id: "$_id",
business: { $push: "$business" },
business_count: { $sum: 1 }
}
}
])
Result will look like the following:
{
_id: ObjectId("52d9212608a224e99676d378"),
business: [
{ name: "abc", rating: 4.5 },
{ name: "xyz", rating: 3.6 }
],
business_count: 2
}
UPD Looks like OP doesn't want to group results by wrapping document _id field. Unfortunately $group expression must specify _id value, otherwise it fails with exception. But, this value can actually be constant (e.g. plain null or 'foobar') so there will be only one resulting group with collection-wise aggregation.

Resources