MongoDB sorting data fails - arrays

im trying to sort around 40k objects in mongo, what i have is two collections, one of comics and other of characters, characters have a field inside with an array of comic ids where they appear. What i want is a pipeline for the aggregation framework that retrieves the comic with the strongest characters (sum of the strength of each character). I am capable of getting the list of comics with the sum of the strength of each character, however when i try to sort it, the database keeps waiting and everything ends up in a timeout. What am i doing wrong?
Characters model:
{
_id: number,
name: string,
info: {
alignment: string // can be "good" or "bad"
}
stats: {
strength: number
},
comics: [] //array of numbers referencing the id of the comic
}
Comics model:
{
_id: number,
name: string
}
And here my query:
db.comics.aggregation(
{
$lookup: {
from: 'characters',
let: {
comic_id: '$_id',
},
as: 'total_comic_str',
pipeline: [
{
$match: {
$expr: {
$and: [
{$in: ['$$comic_id', '$comics']}, // the character is from this comic
{$eq: ['$info.alignment', 'good']} // the character is a hero
]
}
}
},
{
$group: { // group by comic id and accumulate strength of each hero
_id: '$$comic_id',
str: {
$sum: '$stats.strength'
}
}
}
]
}
},
{
$unwind: {
path: '$total_comic_str',
preserveNullAndEmptyArrays: false
}
},
{
$sort: {
'total_comic_str.str': -1
}
},
{
$limit: 1
}
)

You are facing a cursor timeout.
When you have a query cursor (like what returns by find()) you can set noCursorTimeout() (which is generally not a good practice) to prevent the issue.
But when using an aggregation, the Cursor type is different so there is no noCursorTimeout.
As a solution, you can use the $out pipeline to store aggregation result into a temporary collection, then working with the generated collection as you wish.

$lookup with pipeline has shown to have performance issues for large collections
So I would suggest using just the $lookup without pipeline. This will work for your particular dataset that have relatively large characters collection and presumably smaller comics arrays
First, it's better to index what you are going to use in $lookup, so you should add an index for the field comics for this to have a meaningful improvement.
Since the characters will a subdocument array, We are going to use $reduce instead of $group to calculate total strength
Your aggregation pipeline should look like this
[
{
$lookup: {
from: "characters",
localField: "_id", // lookup with _id only we will filter out alignment later
foreignField: "comics",
as: "characters"
}
},
{
$project: {
name: true,
total_strength: {
$reduce: {
input: "$characters",
initialValue: 0,
in: {
$add: [
"$$value",
{
$cond: [
{ $eq: [ "$$this.info.alignment", "good"] }, // calculating only "good" character here
"$$this.stats.strength",
0
]
}
]
}
}
}
}
},
{
$sort: { total_strength: -1 }
},
{
$limit: 1
}
]

Related

Finding documents in mongodb collection by order of elements index of array field

Array field in collection:
"fruits": [ "fruits": [ "fruits": [
{"fruit1": "banana"}, {"fruit2": "apple"}, {"fruit3": "pear"},
{"fruit2": "apple"}, {"fruit4": "orange"}, {"fruit2": "apple"},
{"fruit3": "pear"}, {"fruit1": "banana"}, {"fruit4": "orange"},
{"fruit4": "orange"} {"fruit3": "pear"} {"fruit1": "banana"}
]
I need to find those documents in collections, where "banana" signed before "apple". Does mongodb allows to compare elements in array just like :
if (fruits.indexOf('banana') < fruits.indexOf('apple')) return true;
Or maybe there is any other method to get result i need?
MongoDB's array query operations do not support any positional search as you want.
You can, however, write a $where query to do what you want:
db.yourCollection.find({
$where: function() {
return (this.fruits.indexOf('banana') < this.fruits.indexOf('apple'))
}
})
Be advised though, you won't be able to use indexes here and the performance will be a problem.
Another approach you can take is to rethink the database design, if you can specify what it is you're trying to build, someone can give you specific advise.
One more approach: pre-calculate the boolean value before persisting to DB as a field and query on true / false.
Consider refactoring your schema if possible. The dynamic field names(i.e. fruit1, fruit2...) make it unnecessarily complicated to construct a query. Also, if you require frequent queries by array index, you should probably store your array entries in individual documents with some sort keys to facilitate sorting with index.
Nevertheless, it is achievable through $unwind and $group the documents again. With includeArrayIndex clause, you can get the index inside array.
db.collection.aggregate([
{
"$unwind": {
path: "$fruits",
includeArrayIndex: "idx"
}
},
{
"$addFields": {
fruits: {
"$objectToArray": "$fruits"
}
}
},
{
"$addFields": {
"bananaIdx": {
"$cond": {
"if": {
$eq: [
"banana",
{
$first: "$fruits.v"
}
]
},
"then": "$idx",
"else": "$$REMOVE"
}
},
"appleIdx": {
"$cond": {
"if": {
$eq: [
"apple",
{
$first: "$fruits.v"
}
]
},
"then": "$idx",
"else": "$$REMOVE"
}
}
}
},
{
$group: {
_id: "$_id",
fruits: {
$push: {
"$arrayToObject": "$fruits"
}
},
bananaIdx: {
$max: "$bananaIdx"
},
appleIdx: {
$max: "$appleIdx"
}
}
},
{
$match: {
$expr: {
$lt: [
"$bananaIdx",
"$appleIdx"
]
}
}
},
{
$unset: [
"bananaIdx",
"appleIdx"
]
}
])
Mongo Playground

Referencing another field of subdocument in `$elemMatch`

I'm trying to perform an $elemMatch in a $match aggregation stage where I want to find if there is a document in an array (commitments) of subdocuments whose property tracksThisWeek is smaller than its frequency property, but I'm not sure how can I reference another field of the subdocument in question, I came up with:
{
$match: {
commitments: {
$elemMatch: {
tracksThisWeek: {
$lt: '$frequency',
},
},
},
},
},
I have a document in the collection that should be returned from this aggregation but isn't, any help is appreciated :)
This can't be done, you can't reference any fields in the query language, What you can do is use $expr with aggregation operators, like this:
db.collection.aggregate([
{
$match: {
$expr: {
$gt: [
{
$size: {
$filter: {
input: "$commitments",
cond: {
$lt: [
"$$this.tracksThisWeek",
"$$this.frequency"
]
}
}
}
},
0
]
}
}
}
])
Mongo Playground

MongoDB: How to take multiple fields within a document and output their values into an array (as a new field)?

MongoDB: 4.4.9, Mongosh: 1.0.4
I have a MongoDB collection full of documents with monthly production data as separate fields (monthlyProd1, monthlyProd2, etc.). Each field is one month's production data, and the values are an object data type.
Document example:
_id: ObjectId("314e0e088f183fb7e699d635")
name: "documentName"
monthlyProd1: Object
monthlyProd2: Object
monthlyProd3: Object
...
I want to take all the months and put them into a single new field (monthlyProd) -- a single array of objects.
I can't seem to access the fields with the different methods I've tried. For example, this gets close to doing what I want:
db.monthlyProdData.updateMany({},
{ $push: { "monthlyProd": { $each: [ "$monthlyProd1", "$monthlyProd2", "$monthlyProd3" ] } } }
)
...but instead of taking the value / object data from each field, like I had hoped, it just outputs a string into the monthlyProd array ("$monthlyProd1", "$monthlyProd2", ...):
Actual output:
monthlyProd: Array
0: "$monthlyProd1"
1: "$monthlyProd2"
2: "$monthlyProd3"
...
Desired output:
monthlyProd: Array
0: Object
1: Object
2: Object
...
I want the data, not a string! Lol. Thank you for your help!
Note: some months/fields may be an empty string ("") because there was no production. I want to make sure to not add empty strings into the array -- only months with production / fields that have an object data type. That being said, I can try figuring that out on my own, if I can just get access to these fields' data!
Try this one:
db.collection.updateMany({}, [
// convert to k-v Array
{ $set: { monthlyProd: { $objectToArray: "$$ROOT" } } },
{
$set: {
monthlyProd: {
// removed not needed objects
$filter: {
input: "$monthlyProd",
cond: { $not: { $in: [ "$$this.k", [ "name", "_id" ] ] } }
// or cond: { $in: [ "$$this.k", [ "monthlyProd1", "monthlyProd2", "monthlyProd3" ] ] }
}
}
}
},
// output array value
{ $project: { monthlyProd: "$monthlyProd.v" } }
])
Mongo playground
Thank you to #Wernfried for the original solution to this question. I have modified the solution to incorporate my "Note" about ignoring any empty monthlyProd# values (aka months that didn't have any production), so that they are not added into the final monthlyProd array.
To do this, I added an $and operator to the cond: within $filter, and added the following as the second expression for the $and operator (I used "" and {} to take care of the empty field values if they are of either string or object data type):
{ $not: { $in: [ "$$this.v", [ "", {} ] ] } }
Final solution:
db.monthlyProdData2.updateMany({}, [
// convert to k-v Array
{ $set: { monthlyProd: { $objectToArray: "$$ROOT" } } },
{
$set: {
monthlyProd: {
// removed not needed objects
$filter: {
input: "$monthlyProd",
cond: { $and: [
{ $not: { $in: [ "$$this.k", [ "name", "_id" ] ] } },
{ $not: { $in: [ "$$this.v", [ "", {} ] ] } }
]}
}
}
}
},
// output array value
{ $project: { monthlyProd: "$monthlyProd.v", name: 1 } }
])
Thanks again #Wernfried and Stackoverflow community!

$lookup Array of Objects in parent Array and append the results to each item of said Array

Considering the following document "Backpack", each slots is a piece of said backpack, and each slot has a contents describing various items and a count of them.
{
_id: "backpack",
slots: [
{
slot: "left-pocket",
contents: [
{
item: "pen",
count: 3
},
{
item: "pencil",
count: 2
},
]
},
{
slot: "right-pocket",
contents: [
{
item: "bottle",
count: 1
},
{
item: "eraser",
count: 1
},
]
}
]
}
The item field is the _id of an item of another collection, e.g.:
{
_id: "pen",
color: "red"
(...)
},
Same for pen, pencil, bottle, eraser, etc.
I want to make a $lookup so I can fill in the item's data, but I'm not finding a way of having the lookup's as be the same place as the item. That is:
db.collection.aggregate({
{
$lookup: {
from: 'items',
localField: 'slots.contents.item',
foreignField: '_id',
as: 'convertedItems', // <=== ISSUE
},
},
})
Problem is that as being named convertedItems means the document gets an array of items in the root of the document called 'convertedItems', like this:
{
_id: "backpack",
slots: [ (...) ],
convertedItems: [ (...) ]
}
How can I tell $lookup to actually use the localField as the place to append the data?
That is, make document become:
{
_id: "backpack",
slots: [
{
slot: "left-pocket",
contents: [
{
item: "pen", // <== NOTE
count: 3, // <== NOTE
_id: "pen",
color: "red"
(...)
},
{
item: "pencil", // <== NOTE
count: 2, // <== NOTE
_id: "pencil",
color: "blue"
(...)
},
]
},
(...)
Note: At this point, if have entire data of item, doesn't matter if item property is kept, but count must remain.
I can't manually do $addFields with arrayElemAt because the number of items in slots is not fixed.
Extra Info: I'm using MongoAtlas Free so assume MongoDB 4.2+ (no need to unwind arrays for $lookup).
PS: I thought now of just leaving as root item (e.g. "convertedItems") and on the code that receives the API, when looping through the items, I do Array.find on the "convertedItems" per the the _id using the item. I'll keep the question as I'm curious on how to do on MongoDB side
When you use $lookup, there is a single query in the related collection for each document in the source pipeline, not a query per value in the source document.
If you want each item looked up separately, you'll need to unwind the arrays so each document in the pipeline contains a single item, do the lookup, and then group to rebuild the arrays.
db.collection.aggregate([
{$unwind: "$slots"},
{$unwind: "$slots.contents"},
{$lookup: {
from: "items",
localField: "slots.contents.item",
foreignField: "_id",
as: "convertedItems"
}},
{$group: {
_id: "$slots.slot",
root: {$first: "$$ROOT"},
items: {
$push: {
$mergeObjects: [
"$slots.contents",
{$arrayElemAt: ["$convertedItems", 0]}
]
}},
}},
{$addFields: {"root.slots.contents": "$items"}},
{$replaceRoot: {newRoot: "$root"}},
{$group: {
_id: "$_id",
root: {$first: "$$ROOT"},
slots: {$push: "$slots"}
}},
{$addFields: {"root.slots": "$slots"}},
{$replaceRoot: {newRoot: "$root"}},
{$project: { convertedItems: 0}}
])
Playground
unwind makes your collection explode, Also you can't specify in place of
'as', So you need to add additional stages like addFields, filters to
get required o/p
As I've commented, your requirement has a bit to do in order to match main doc's elements with $lookup result, maybe this can be easily done by code, but if it has to be done by query, using this query you'll be working on same no.of docs as what you've in collection quiet opposite to unwind as it would explode you docs when having nested arrays like what you've now, As in general this is a bit complex try to use $match as first stage to filter docs if needed for better performance. Additionally you can use $explain to get to know about your query performance.
Query :
db.Backpack.aggregate([
/** lookup on items collection & get matched docs to items array */
{
$lookup: {
from: "items",
localField: "slots.contents.item",
foreignField: "_id",
as: "items"
}
},
/** Iterate on slots & contents & internally filter on items array to get matched doc for a content object &
* merge the objects back to respective objects to form the same structure */
{
$project: {
slots: {
$map: {
input: "$slots",
in: {
$mergeObjects: [
"$$this",
{
contents: {
$map: {
input: "$$this.contents",
as: "c",
in: {
$mergeObjects: [
"$$c",
{
$let: {
vars: {
matchedItem: {
$arrayElemAt: [
{
$filter: {
input: "$items",
as: "i",
cond: {
$eq: [
"$$c.item",
"$$i._id"
]
}
}
},
0
]
}
},
in: {
color: "$$matchedItem.color"
}
}
}
]
}
}
}
}
]
}
}
}
}
}
])
Test : MongoDB-Playground

MongoDB remove duplicate subdocuments inside array based on a specific field

My documents have the following structure:
{
_id: ObjectId("59303aa1bad1081d4b98d636"),
clear_number: "83490",
items: [
{
name: "83490_1",
file_id: "e7209bbb",
hash: "2f568bb196f74263c64b7cf273f8ceaa",
},
{
name: "83490_2",
file_id: "9a56a935",
hash: "9c6230f7bf19d3f3186c6c3231ac2055",
},
{
name: "83490_2",
file_id: "ce5f6773",
hash: "9c6230f7bf19d3f3186c6c3231ac2055",
}
],
group_id: null
}
How to remove one of two subdocuments with the same items hash?
The following should do the trick if I understand you question correctly:
collection.aggregate({
$unwind: "$items" // flatten the items array
}, {
$group: {
"_id": { "_id": "$_id", "clear_number": "$clear_number", "group_id": "$group_id", "hash": "$items.hash" }, // per each document group by hash value
"items": { $first: "$items" } // keep only the first of all matching ones per group
}
}, {
$group: {
"_id": { "_id": "$_id._id", "clear_number": "$_id.clear_number", "group_id": "$_id.group_id" }, // now let's group everything again without the hashes
"items": { $push: "$items" } // push all single items into the "items" array
}
}, {
$project: { // this is just to restore the original document layout
"_id": "$_id._id",
"clear_number": "$_id.clear_number",
"group_id": "$_id.group_id",
"items": "$items"
}
})
In response to your comment I would suggest the following query to get the list of all document ids that contain duplicate hashes:
collection.aggregate({
$addFields: {
"hashes": {
$setUnion: [
[ { $size: "$items.hash" } ], // total number of hashes
[ { $size: { $setUnion: "$items.hash" } } ] // number of distinct hashes
]
}
}
}, {
$match:
{
"hashes.1": { $exists: true } // find all documents with a different value for distinct vs total number of hashes
}
}, {
$project: { _id: 1 } // only return _id field
})
There might be different approaches but this one seems pretty straight forward:
Basically, in the $addFields part, for each document, we first create an array consisting of two numbers:
the total number of hashes
the number of distinct hashes
Then we drive this array of two numbers through a $setUnion. After this step there can
either be two different numbers left in the array in which case the hash field does contain duplicates
or there is only one element left, in which case the number of distinct hashes equals the total number of hashes (so there are no duplicates).
We can check if there are two items in the array by testing if the element at position 1 (arrays are zero-based!) exists. That's what the $match stage does.
And the final $project stage is just to limit the output to the _id field only.

Resources