MongoDB query optimization ($lookup)

MongoDB query optimization ($lookup) - database

I want to get a specific user's recently updated threads and the member document of it. My query does extra job, it gets threads through $lookup for each member document and checks if the thread matches the condition: { $gt: ["$updated_at", timestamp] }, How to optimize the query?
I could add updated_at to the member but a update query is needed for a large number of documents (200,000 or 2,000,000) every time the threads gets updated, about every 2 seconds. How long does it take to update the documents?
Member
Thread
_id
_id
thread_id
title
user_id
updated_at
deleted
destroyed
My query:
const user_id = 4;
const timestamp = 1611011380321;
const data = await Member.aggregate([
{
$match: {
user_id,
deleted: false
}
},
{
$lookup: {
from: "threads",
let: {
thread_id: "$thread_id",
},
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ["$_id", "$$thread_id"] },
{ $eq: ["$destroyed", false] },
{ $gt: ["$updated_at", timestamp] }
]
}
}
}
],
as: "thread"
}
},
{ $sort: { "$thread.updated_at": -1 },
{ $limit: 10 }
]);

Related

MongoDB using skip and distinct in a query based on values inside an array

So I have document that is structure like this
_id: ObjectId('62bbe17d8fececa06b91873d')
clubName: 'test'
staff:[
'62bbe47f8fececa06b9187d8'
'624f4b56ab4f5170570cdba3' //IDS of staff members
]
A single staff can be assigned to multiple clubs so what I'm trying to achieve is to get all staff that has been assigned to at least one club and display them on a table on the front end, I followed this solution since distinct and skip can't be used on a single query but it just returned this:
[
{ _id: [ '624f5054ab4f5170570cdd16', '624f5054ab4f5170570cdd16' ] } //staff from club 1,
{ _id: [ '624f5054ab4f5170570cdd16', '624f9194ab4f5170570cded1' ] } //staff from club 2,
{ _id: [ '624f4b56ab4f5170570cdba3' ]} //staff from club 3
]
my desired outcome would be like this:
[ _id : ['624f5054ab4f5170570cdd16', '624f9194ab4f5170570cded1', '624f4b56ab4f5170570cdba3'] ]
here's my query:
const query = this.clubModel.aggregate(
[{ $group: { _id: '$staff' } }, { $skip: 0}, { $limit: 10}],
(err, results) => {
console.log(results);
},
);
the values returned are not distinct at all, is there an operation that can evaluate the value inside an array and make them distinct?
Here's my new query after adding the 'createdAt' field in my document structure:
const query = this.clubModel.aggregate([
{ $sort: { createdAt: -1 } },
{
$unwind: '$drivers',
},
{
$project: {
isActive: true,
},
},
{
$group: {
_id: 'null',
ids: {
$addToSet: '$drivers',
},
},
},
{
$project: {
_id: 0,
},
},
{
$skip: skip,
},
{
$limit: limit,
},
]);

Does this works for you, first UNWIND the staff array, and then group on "_id" as null and add staff values using $addToSet:
db.collection.aggregate([
{
"$unwind": "$staff"
},
{
"$group": {
"_id": "null",
"ids": {
"$addToSet": "$staff"
}
}
},
{
"$project": {
"_id": 0,
}
},
{
$skip: 0
},
{
$limit: 10
}
])
Here's the working link.

MongoDB do not return document if condition in lookup fails

I am having this mongo DB query which queries a collection called songs and for each song, returns the respective album associated:
db.songs.aggregate([{
$lookup: {
from: "albums",
let: { album: '$album' },
as: "album",
pipeline: [{
$match: {
$expr: {
$and: [
{ $eq: ['$albumId', '$$album._id'] },
{ $eq: ['$status', 'Draft'] },
]
}
}
}]
}
}])
In the above query, my intention was to return a song only if the album was in Draft status, but in contrast, it returns all songs, and for the ones for which the album is not in Draft, it just returns an empty array inside the lookup. How can I not return the song document at all if the album is not in Draft?
Additionally, is it possible to flatten the results in the document? ie, merge all the fields of albums into the song document?

Once you perform the $lookup you can filter out the documents with an empty array:
{ $match: { album: { $ne: [] } }}
Then there is an example in the MongoDB documentation for the $mergeObjects operator that is very similar to your case. Assuming that each song belongs to one album, put together your aggregation pipeline may look like this:
db.songs.aggregate([
{
$lookup: {
from: "albums",
let: { album: '$album' },
as: "album",
pipeline: [{
$match: {
$expr: {
$and: [
{ $eq: ['$albumId', '$$album._id'] },
{ $eq: ['$status', 'Draft'] },
]
}
}
}]
}
},
{ $match: { album: { $ne: [] } }},
{
$replaceRoot: { newRoot: { $mergeObjects: [ { $arrayElemAt: [ "$album", 0 ] }, "$$ROOT" ] } }
},
{ $project: { album: 0 } }
])

You may want to experiment going in the other direction: find albums in status = Draft then get the songs:
db.album.aggregate([
{$match: {"status":"Draft"}}
,{$lookup: {from: "song",
localField: "album", foreignField: "album",
as: "songs"}}
// songs is now an array of docs. Run $map to turn that into an
// array of just the song title, and overwrite it (think x = x + 1):
,{$addFields: {songs: {$map: {
input: "$songs",
in: "$$this.song"
}} }}
]);
If you have a LOT of material in the song document, you can use the fancier $lookup to cut down the size of the docs in the lookup array -- but you still need the $map to turn it into an array of strings.
db.album.aggregate([
{$match: {"status":"Draft"}}
,{$lookup: {from: "song",
let: { aid: "$album" },
pipeline: [
{$match: {$expr: {$eq:["$album","$$aid"]}}},
{$project: {song:true}}
],
as: "songs"}}
,{$addFields: {songs: {$map: {
input: "$songs",
in: "$$this.song"
}} }}
]);

MongoDB: How to copy Documents to a new field of associated Documents from other collections?

Collections that I have:
Product:
[
{
"_id":"product_id_1",
"name":"Product 1",
"price":50
},
{
"_id":"product_id_2",
"name":"Product 2",
"price":100
}
]
Category:
[
{
"_id":"category_id_1",
"name":"Category 1"
},
{
"_id":"category_id_2",
"name":"Category 2"
}
]
Audit:
[
{
"_id":"audit_id_1",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_2",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-09T00:00:00.000Z"
},
{
"_id":"audit_id_3",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"price",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_4",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_5",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-09T00:00:00.000Z"
}
]
Collection Audit is using for saving details about each Product or Category documents updates.
For example, we see that the attribute name of Product with id product_id_1 was changed twice:
9th of January and 10th of January.
attribute price of the same Product was changed only once: 10th of January.
The same kind of information saved for Category collection as well.
The goal that I want to achieve is:
Extract existing Documents from Audit collection that contain information only about the latest changes for each unique attribute per each unique resource and copy them to a new field of related document of Product/Category collections.
As result, the Product/Category collections should look like this:
Product:
[
{
"_id":"product_id_1",
"name":"Product 1",
"price":50,
"audit":[
{
"_id":"audit_id_1",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_3",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"price",
"executionTime":"2021-01-10T00:00:00.000Z"
}
]
},
{
"_id":"product_id_2",
"name":"Product 2",
"price":100,
"audit":[
]
}
]
Category:
[
{
"_id":"category_id_1",
"name":"Category 1",
"audit":[
{
"_id":"audit_id_4",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
}
]
},
{
"_id":"category_id_2",
"name":"Category 2",
"audit":[
]
}
]
I tried to write a query by myself, and this is what I got:
db.getCollection("audit").aggregate([
{
$match: {
"resource_type": "product"}
},
{
$sort: {
executionTime: -1
}
},
{
$group: {
_id: {
property: "$attribute",
entity: "$resource_id"
},
document: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$document"
}
}
]).forEach(function(a){
db.getCollection("product").update({"_id" :ObjectId(a.resource_id)},{addToSet : {audit:[a]}})
});
The problems that I see with my solution are:
it will update only one Product collection. It means that I need to execute my code at list twice, for each existing collections.
forEach statement, I am not sure where exactly this command executed on the server-side or on client-side, assume Audit collection contains approx 100k documents, from the performance point of view, how fast this command will be executed?
So, definitely, I have a feeling that I need to rewrite my solution, but I have doubts about how to make it better.
For example, I read about $merge command, which can do a quite similar job that I do in forEach section, but I do not know how to apply $merge in the aggregation flow that I wrote above properly.

First of all forEach is executed on the client side, which means you download result of the aggregation and make 1 update request per each document in the result. Although it is the most flexible way it is the most expensive one. Aggregation pipeline with $out and $merge on the other hand is executed on the serverside so you don't pipe data through the client.
Secondly, if you need to update 2 collections you will need at least 2 queries. There is no way to $out to multiple collections.
Finally, you need to use the subquery syntax of the $lookup. It is more flexible and let you define "joining" logic in pipeline terms. For products it would be:
db.products.aggregate([
{
$lookup: {
from: "audit",
let: {
id: "$_id"
},
pipeline: [
{ "$match": {
$expr: { $eq: [ "$resource_id", "$$id" ] }, // the foreign key match
resource_type: "product" // the discriminator
} },
{ $sort: { "executionTime": -1 } }, // chronological order
{ "$group": {
_id: {
attribute: "$attribute", // for each unique attribute
id: "$resource_id" // per each unique resource
},
value: {
$first: "$$ROOT" // pick the latest
}
} },
{ "$replaceRoot": { "newRoot": "$value" } }
],
as: "audit"
}
}
])
The $out stage and its limitations you already learned from the previous answer.
The second pipeline to update categories will be exactly the same but with another $out destination and another value in the discriminator.

want to post the code written by myself:
db.getCollection("product").aggregate([
{ $match: {} },
{
$lookup: {
from: 'audit',
localField: '_id',
foreignField: 'resource_id',
as: 'audit'
}
},
{
$unwind: '$audit'
},
{
$sort: { "audit.executionTime": -1 }
},
{
$group: {
_id: {
property: "$audit.attribute",
entity: "$audit.resource_id"
},
document: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$document"
}
},
{
$group: {
_id: "$_id",
audit: { $push: "$audit" }
}
},
{
$merge: {
into: 'product',
on: "_id",
whenMatched: 'merge',
whenNotMatched: 'insert'
}
}])

MongoDb count percent of document with a certain field present

I have some MongoDb document's(representing orders) and their schema looks roughly like that:
{
id: ObjectID
exchange_order_products: Array
}
The exchange_order_products array is empty if the customer didn't exchange any items he ordered, or if they did, the array will contain an Object for each item exchanged.
I want to get the percent of orders in which the customer didn't exchange anything, e.g. exchange_order_products array is empty.
So basically the formula is the following: (Number Of Orders With At Least One Exchange * 100) / Number of Orders With No Exchanges
I know that I can count the number of orders where the exchange_order_products array is empty like that:
[{$match: {
exchange_order_products: {$exists: true, $size: 0}
}}, {$count: 'count'}]
But how do I simultaneously get the number of all the documents in my collection?

You can use $group and $sum along with $cond to count empty and non-empty ones separately. Then you need $multiply and $divide to calculate the percentage:
db.collection.aggregate([
{
$group: {
_id: null,
empty: { $sum: { $cond: [ { $eq: [ { $size: "$exchange_order_products" }, 0 ] }, 1, 0 ] } },
nonEmpty: { $sum: { $cond: [ { $eq: [ { $size: "$exchange_order_products" }, 0 ] }, 0, 1 ] } },
}
},
{
$project: {
percent: {
$multiply: [
100, { $divide: [ "$nonEmpty", "$empty" ] }
]
}
}
}
])
Mongo Playground

How to use $text search inside $lookup pipeline

I have the next collection for exaple:
// vehicles collection
[
{
"_id": 321,
manufactor: SOME-OBJECT-ID
},
{
"_id": 123,
manufactor: ANOTHER-OBJECT-ID
},
]
And I have a collection named tables:
// tables collection
[
{
"_id": SOME-OBJECT-ID,
title: "Skoda"
},
{
"_id": ANOTHER-OBJECT-ID,
title: "Mercedes"
},
]
As you can see, the vehicles collection's documents are pulling data from the
tables's collection ducments - the first document in the vehicles collection has a manufactor
id which is getting pulled from the tables collection and named Skoda.
That is great.
When I am querying the DB using aggregate I can able to easily pull the remote data from the remote collections
respectively - without any problem.
I can also easily make rules and limitations like $project, $sort, $skip, $limit and others.
But I want to display to the user only those vehicles that are manufcatord by Mercedes.
Since Mercedes is not mentioned in the vehicles collection, but only its ID, the $text $search would not
return with the right results.
This is the aggregate pipeline that I provide:
[
{
$match: {
$text: {
$search: "Mercedes"
}
}
},
{
$lookup: {
from: "tables",
let: {
manufactor: "$manufactor"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$manufactor"
]
}
}
},
{
$project: {
title: 1
}
}
],
as: "manufactor"
},
},
{
$unwind: "$manufactor"
},
{
$lookup: {
from: "tables",
let: {
model: "$model"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$model"
]
}
}
},
{
$project: {
title: 1
}
}
],
as: "model"
},
},
{
$unwind: "$model"
},
{
$lookup: {
from: "users",
let: {
joined_by: "$_joined_by"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$joined_by"
]
}
}
},
{
$project: {
personal_info: 1
}
}
],
as: "joined_by"
},
},
{
$unwind: "$joined_by"
}
]
As you can see I am using the $text and $search $match at the first stage in the pipleline - otherwise
MongoDB will throw an error.
But this $text $search object searhed only in the origin collection - the vehicles collection.
Is there a way to tell MongoDB to search in the remote collection with the $text and $search method
and then put in the aggregate only results that are matching both?
UPDATE
When I am doing this instead:
{
$lookup: {
from: "tables",
pipeline: [
{
$match: {
$text: {
$search: "Mercedes"
}
}
},
{
$project: {
title: 1
}
}
],
as: "manufactor"
},
},
This is what I receive:
MongoError: pipeline requires text score metadata, but there is no text score available

if you are using one of the affected versions in this thread, you need to update your mongodb server.
As you can see the issue was fixed in version 4.1.8

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

MongoDB query optimization ($lookup) - database

Related

MongoDB using skip and distinct in a query based on values inside an array

MongoDB do not return document if condition in lookup fails

MongoDB: How to copy Documents to a new field of associated Documents from other collections?

MongoDb count percent of document with a certain field present

How to use $text search inside $lookup pipeline

Categories

Resources