I have the next collection for exaple:
// vehicles collection
[
{
"_id": 321,
manufactor: SOME-OBJECT-ID
},
{
"_id": 123,
manufactor: ANOTHER-OBJECT-ID
},
]
And I have a collection named tables:
// tables collection
[
{
"_id": SOME-OBJECT-ID,
title: "Skoda"
},
{
"_id": ANOTHER-OBJECT-ID,
title: "Mercedes"
},
]
As you can see, the vehicles collection's documents are pulling data from the
tables's collection ducments - the first document in the vehicles collection has a manufactor
id which is getting pulled from the tables collection and named Skoda.
That is great.
When I am querying the DB using aggregate I can able to easily pull the remote data from the remote collections
respectively - without any problem.
I can also easily make rules and limitations like $project, $sort, $skip, $limit and others.
But I want to display to the user only those vehicles that are manufcatord by Mercedes.
Since Mercedes is not mentioned in the vehicles collection, but only its ID, the $text $search would not
return with the right results.
This is the aggregate pipeline that I provide:
[
{
$match: {
$text: {
$search: "Mercedes"
}
}
},
{
$lookup: {
from: "tables",
let: {
manufactor: "$manufactor"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$manufactor"
]
}
}
},
{
$project: {
title: 1
}
}
],
as: "manufactor"
},
},
{
$unwind: "$manufactor"
},
{
$lookup: {
from: "tables",
let: {
model: "$model"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$model"
]
}
}
},
{
$project: {
title: 1
}
}
],
as: "model"
},
},
{
$unwind: "$model"
},
{
$lookup: {
from: "users",
let: {
joined_by: "$_joined_by"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$joined_by"
]
}
}
},
{
$project: {
personal_info: 1
}
}
],
as: "joined_by"
},
},
{
$unwind: "$joined_by"
}
]
As you can see I am using the $text and $search $match at the first stage in the pipleline - otherwise
MongoDB will throw an error.
But this $text $search object searhed only in the origin collection - the vehicles collection.
Is there a way to tell MongoDB to search in the remote collection with the $text and $search method
and then put in the aggregate only results that are matching both?
UPDATE
When I am doing this instead:
{
$lookup: {
from: "tables",
pipeline: [
{
$match: {
$text: {
$search: "Mercedes"
}
}
},
{
$project: {
title: 1
}
}
],
as: "manufactor"
},
},
This is what I receive:
MongoError: pipeline requires text score metadata, but there is no text score available
if you are using one of the affected versions in this thread, you need to update your mongodb server.
As you can see the issue was fixed in version 4.1.8
Related
I have two collections in my mongoDb database, one called media, one called users.
Documents in the media collection:
{"_id":{"$oid":"6379204a8cf5677554c26c1b"},"_partition":"6378eb74f6613d5d4192da79","name":"silas test"}
{"_id":{"$oid":"6378eb74f6613d5d4192da79"},"_partition":"6378eb74f6613d5d4192da79","name":"test media"}
Document in the users collection:
{
"_id":{
"$oid":"6379ee6c770a8f43afc8e3e4"},
"_partition":"6378eb74f6613d5d4192da79",
"types":[
{
"mediaReference":[
{"$oid":"6379204a8cf5677554c26c1b"}
],
"mediatype":"podcast"
},
{
"mediaReference":[
{"$oid":"6378eb74f6613d5d4192da79"}
],
"mediatype":"movie"
}
],
"username":"silas"
}
Now im looking for a way to use the mongoDb aggregation pipeline to get the following result:
{
"_id":{
"$oid":"6379ee6c770a8f43afc8e3e4"},
"_partition":"6378eb74f6613d5d4192da79",
"types":[
{
"mediaReference":[
{
"_id": {
"$oid":"6379204a8cf5677554c26c1b"
},
"_partition":"6378eb74f6613d5d4192da79",
"name":"silas test"
}
],
"mediatype":"podcast"
},
{
"mediaReference":[
{
"_id": {
"$oid":"6378eb74f6613d5d4192da79"
},
"_partition":"6378eb74f6613d5d4192da79",
"name":"test media"
}
],
"mediatype":"movie"
}
],
"username":"silas"
}
Basically im searching for a way to paste the refferenced document into the mediaReference object inside the users document.
i would appreciate any help
EDIT:
I've found the solution:
db.users.aggregate([{
$unwind: {
path: '$types'
}
}, {
$lookup: {
from: 'media',
localField: 'types.mediaReference',
foreignField: '_id',
as: 'types.mediaReference'
}
}, {
$group: {
_id: '$_id',
merged: {
$first: '$$ROOT'
},
types: {
$push: '$types'
}
}
}, {
$addFields: {
'merged.types': '$types'
}
}, {
$replaceRoot: {
newRoot: '$merged'
}
}])
Collections that I have:
Product:
[
{
"_id":"product_id_1",
"name":"Product 1",
"price":50
},
{
"_id":"product_id_2",
"name":"Product 2",
"price":100
}
]
Category:
[
{
"_id":"category_id_1",
"name":"Category 1"
},
{
"_id":"category_id_2",
"name":"Category 2"
}
]
Audit:
[
{
"_id":"audit_id_1",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_2",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-09T00:00:00.000Z"
},
{
"_id":"audit_id_3",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"price",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_4",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_5",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-09T00:00:00.000Z"
}
]
Collection Audit is using for saving details about each Product or Category documents updates.
For example, we see that the attribute name of Product with id product_id_1 was changed twice:
9th of January and 10th of January.
attribute price of the same Product was changed only once: 10th of January.
The same kind of information saved for Category collection as well.
The goal that I want to achieve is:
Extract existing Documents from Audit collection that contain information only about the latest changes for each unique attribute per each unique resource and copy them to a new field of related document of Product/Category collections.
As result, the Product/Category collections should look like this:
Product:
[
{
"_id":"product_id_1",
"name":"Product 1",
"price":50,
"audit":[
{
"_id":"audit_id_1",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_3",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"price",
"executionTime":"2021-01-10T00:00:00.000Z"
}
]
},
{
"_id":"product_id_2",
"name":"Product 2",
"price":100,
"audit":[
]
}
]
Category:
[
{
"_id":"category_id_1",
"name":"Category 1",
"audit":[
{
"_id":"audit_id_4",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
}
]
},
{
"_id":"category_id_2",
"name":"Category 2",
"audit":[
]
}
]
I tried to write a query by myself, and this is what I got:
db.getCollection("audit").aggregate([
{
$match: {
"resource_type": "product"}
},
{
$sort: {
executionTime: -1
}
},
{
$group: {
_id: {
property: "$attribute",
entity: "$resource_id"
},
document: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$document"
}
}
]).forEach(function(a){
db.getCollection("product").update({"_id" :ObjectId(a.resource_id)},{addToSet : {audit:[a]}})
});
The problems that I see with my solution are:
it will update only one Product collection. It means that I need to execute my code at list twice, for each existing collections.
forEach statement, I am not sure where exactly this command executed on the server-side or on client-side, assume Audit collection contains approx 100k documents, from the performance point of view, how fast this command will be executed?
So, definitely, I have a feeling that I need to rewrite my solution, but I have doubts about how to make it better.
For example, I read about $merge command, which can do a quite similar job that I do in forEach section, but I do not know how to apply $merge in the aggregation flow that I wrote above properly.
First of all forEach is executed on the client side, which means you download result of the aggregation and make 1 update request per each document in the result. Although it is the most flexible way it is the most expensive one. Aggregation pipeline with $out and $merge on the other hand is executed on the serverside so you don't pipe data through the client.
Secondly, if you need to update 2 collections you will need at least 2 queries. There is no way to $out to multiple collections.
Finally, you need to use the subquery syntax of the $lookup. It is more flexible and let you define "joining" logic in pipeline terms. For products it would be:
db.products.aggregate([
{
$lookup: {
from: "audit",
let: {
id: "$_id"
},
pipeline: [
{ "$match": {
$expr: { $eq: [ "$resource_id", "$$id" ] }, // the foreign key match
resource_type: "product" // the discriminator
} },
{ $sort: { "executionTime": -1 } }, // chronological order
{ "$group": {
_id: {
attribute: "$attribute", // for each unique attribute
id: "$resource_id" // per each unique resource
},
value: {
$first: "$$ROOT" // pick the latest
}
} },
{ "$replaceRoot": { "newRoot": "$value" } }
],
as: "audit"
}
}
])
The $out stage and its limitations you already learned from the previous answer.
The second pipeline to update categories will be exactly the same but with another $out destination and another value in the discriminator.
want to post the code written by myself:
db.getCollection("product").aggregate([
{ $match: {} },
{
$lookup: {
from: 'audit',
localField: '_id',
foreignField: 'resource_id',
as: 'audit'
}
},
{
$unwind: '$audit'
},
{
$sort: { "audit.executionTime": -1 }
},
{
$group: {
_id: {
property: "$audit.attribute",
entity: "$audit.resource_id"
},
document: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$document"
}
},
{
$group: {
_id: "$_id",
audit: { $push: "$audit" }
}
},
{
$merge: {
into: 'product',
on: "_id",
whenMatched: 'merge',
whenNotMatched: 'insert'
}
}])
I have a structure like this
unions { // collection
members { // array
instanceId // some id
...
}
...
}
In documents, I have ids prop (array)
I need to lookup all unions that have at least one id from ids (basically $in)
The problem is that it doesn't work
First I wanted to try this variant
{
from: 'unions',
let: { instanceIds: '$ids' },
as: 'unions',
pipeline: [
{
$match: { 'members.instanceId': { $in: '$$instanceIds' } },
},
],
}
But we can't use aggregation variables here. For that, we need to use $expr
{
from: 'unions',
let: { instanceIds: '$ids' },
as: 'unions',
pipeline: [
{
$match: {
$expr: {
$in: ['$members.instanceId', '$$instanceIds']
}
},
},
],
}
But then it returns 0 documents. The instanceIds array is not empty, I've checked it.
Also, if I paste an array with values in the example without $expr then it returns the right values. So most likely the problem is how I build this $lookup.
use { $ne: [{ $setIntersection: ['$members.instanceId', '$$instanceIds'] }, []] }
{
from: 'unions',
let: { instanceIds: '$ids' },
as: 'unions',
pipeline: [
{
$match: {
$expr: {
cond: { $ne: [{ $setIntersection: ['$members.instanceId', '$$instanceIds'] }, []] },
},
},
},
],
}
I'm having a Model which is structured similar to this:
{
"_id": ObjectId("5c878c5c18a4ff001b981zh5"),
"books": [
ObjectId("5d963a7544ec1b122ab2ddc"),
ObjectId("5d963be01f663d168f8ea4dc"),
ObjectId("5d963bcb1f663d168f8ea2f4"),
ObjectId("5d963bdf1f663d16858ea7c9"),
}
Now I want to use the aggregation framework to get a list of only the populated books, like:
{ _id: ObjectId("5d963a7544ec1b122ab2ddc"), title: ...., ... },
..
.aggregate([
{
$lookup: {
from: 'books',
let: { books: '$books' },
pipeline: [{ $match: { $expr: { _id: { $in: ['_id', '$$books'] } } } }],
as: 'bookInfos'
}
},
{ $unwind: '$bookInfos' },
{ $replaceRoot: { newRoot: '$bookInfos' } }
])
I am not too sure about your question, but I think this might be what you're looking for.
So this query worked for me:
{
$match: {
_id: user._id,
},
},
{
$lookup: {
from: "books",
localField: "books",
foreignField: "_id",
as: "booksInfo",
},
},
{ $unwind: "$booksInfo" },
{
$replaceRoot: {
newRoot: "$booksInfo",
},
},
Thanks #zishone. Somehow your query returned all the books available in the db and not only the ones from the User Model, but it works as desired when looking up the documents with local and foreignField.
aggregation in nodejs resulting in nested json, can I get it without nesting, taking only one data _id from all collections. Is there any possibility to get the data without a nested json
I was trying aggregation in nodejs with the below code. I got the output as given in output session below. But I would like to get the output as expected output, since I cant use looping on looping
Student.aggregate([
{
$match: { name: 'abcd'}
},
{
$lookup:{
from:'teachers',
pipeline: [
{
$match: { name: 'pqrs' }
},
{
$project:{
"_id":1
}
}
],
as: "teacherLookup"
}
},
{
$lookup:
{
from:'subjects',
pipeline: [
{
$match: { name: 'computer' }
},
{
$project:{
"_id":1
}
}
],
as: "subjectLookup"
}
}
])
output
[
{
_id: '52301c7878965455d2a4',
teacherLookup: [ '5ea737412589688930' ],
subjectLookup: [ '5ea745821369999917' ]
}
]
I am expecting the output as (without nested json)
[
{
studentId: '5ea1c7878965455d2a4',
teacherId: '5ea737412589688930' ,
subjectId: '5ea745821369999917'
}
]
You can use $arrayElemAt to get the first element from the array.
Student.aggregate([
{
$match: { name: "abcd" },
},
{
$lookup: {
from: "teachers",
pipeline: [
{
$match: { name: "pqrs" },
},
{
$project: {
_id: 1,
},
},
],
as: "teacherId",
},
},
{
$lookup: {
from: "subjects",
pipeline: [
{
$match: { name: "computer" },
},
{
$project: {
_id: 1,
},
},
],
as: "subjectId",
},
},
{
$project: {
teacherId: { $arrayElemAt: ["$teacherId", 0] },
subjectId: { $arrayElemAt: ["subjectId", 0] },
},
}
]);