mongodb - using join on a local variable - arrays

I'm using node.js and mongodb, I have an array of objects which holds the names of an id. Let's say below is my array
let names = [
{ value: 1, text: 'One' },
{ value: 2, text: 'Two' },
{ value: 3, text: 'Three' },
{ value: 4, text: 'Gour' }
]
And this is my query result of a collection using $group which gives me the distinct values.
[
{ _id: { code: '1', number: 5 } },
{ _id: { code: '2', number: 5 } },
{ _id: { code: '3', number: 2 } },
{ _id: { code: '4', number: 22 } },
]
$lookup let's us to join the data from a different collection, but in my case I have an array which holds the text value for each of the codes which I got from the query.
Is there a way we can map the text from the array to the results from mongodb?
Any help will be much appreciated.
EDIT
MongoDB query which I was trying
db.collection.aggregate([
{
$match: {
_Id: id
}
},
{
$lookup: {
localField: "code",
from: names,
foreignField: "value",
as: "renderedNames"
}
},
{
"$group" : {
"_id": {
code: "$code",
number: "$number"
}
}
}
]);

Local variable lives in nodejs app, and mongodb knows nothing about it.
It looks like it belongs to representation layer, where you want to show codes as meaningful names. The mapping should be done there. I believe find is the most suitable here:
names.find(name => name.code === doc._id.code).text
If the names are not truly variable but quite constant, you can move it to own collection, e.g. codeNames:
db.codeNames.insert([
{ _id: "1", text: 'One' },
{ _id: "2", text: 'Two' },
{ _id: "3", text: 'Three' },
{ _id: "4", text: 'Gour' }
]);
and use $lookup as following:
db.collection.aggregate([
{
$match: {
_Id: id
}
},
{
"$group" : {
"_id": {
code: "$code",
number: "$number"
}
}
},
{
$lookup: {
localField: "_id.code",
from: "codeNames",
foreignField: "_id",
as: "renderedNames"
}
}
]);
If none of the above suit your usecase, you can pass the names to the database in each request to map names db-side, but you must be really really sure you cannot use 2 previous options:
db.collection.aggregate([
{
$match: {
_Id: id
}
},
{
"$group" : {
"_id": {
code: "$code",
number: "$number"
}
}
},
{
$project: {
renderedNames: { $filter: {
input: [
{ value: "1", text: 'One' },
{ value: "2", text: 'Two' },
{ value: "3", text: 'Three' },
{ value: "4", text: 'Gour' }
],
as: "name",
cond: { $eq: [ "$$name.value", "$_id.code" ] }
}
}
}
},
]);
As a side note, I find $match: {_Id: id} quite confusing, especially followed by $group. If _Id is _id, it is unique. You can have no more than 1 document after this stage, so there is not too much to group really.

Related

Mongodb: check that all the fields of the elements of an array of objects respect a condition

I have a database of a the employees of a company that looks like this:
{
_id: 7698,
name: 'Blake',
job: 'manager',
manager: 7839,
hired: ISODate("1981-05-01T00:00:00.000Z"),
salary: 2850,
department: {name: 'Sales', location: 'Chicago'},
missions: [
{company: 'Mac Donald', location: 'Chicago'},
{company: 'IBM', location: 'Chicago'}
]
}
I have an exercise in which I need to write the MongoDb command that returns all them employees who did all their missions in Chicago. I struggle with the all because I cannot find a way to check that all the locations of the missions array are equal to 'Chicago'.
I was thinking about doing it in two time: first find the total number of missions the employee has and then compare it to the number of mission he has in Chicago (that how I would do in SQL I guess). But I cannot found the number of mission the employee did in Chicago. Here is what I tried:
db.employees.aggregate([
{
$match: { "missions": { $exists: true } }
},
{
$project: {
name: 1,
nbMissionsChicago: {
$sum: {
$cond: [
{
$eq: [{
$getField: {
field: { $literal: "$location" },
input: "$missions"
}
}, "Chicago"]
}, 1, 0
]
}
}
}
}
])
Here is the result :
{ _id: 7698, name: 'Blake', nbMissionsChicago: 0 }
{ _id: 7782, name: 'Clark', nbMissionsChicago: 0 }
{ _id: 8000, name: 'Smith', nbMissionsChicago: 0 }
{ _id: 7902, name: 'Ford', nbMissionsChicago: 0 }
{ _id: 7499, name: 'Allen', nbMissionsChicago: 0 }
{ _id: 7654, name: 'Martin', nbMissionsChicago: 0 }
{ _id: 7900, name: 'James', nbMissionsChicago: 0 }
{ _id: 7369, name: 'Smith', nbMissionsChicago: 0 }
First of all, is there a better method to check that all the locations of the missions array respect the condition? And why does this commands returns only 0 ?
Thanks!
If all you need is the agents who had all their missions in "Chicago" then you don't need an aggregation pipeline for it, specifically the approach of filtering the array as part of the aggregation can't utilize an index and will make performance even worse.
A simple query should suffice here:
db.collection.find({
$and: [
{
"missions": {
$exists: true
}
},
{
"missions.location": {
$not: {
$gt: "Chicago"
}
}
},
{
"missions.location": {
$not: {
$lt: "Chicago"
}
}
}
]
})
Mongo Playground
This way we can build an index on the missions field and utilize it properly, any documents with a different value other then "Chigaco" will not match as they will fail the $gt or $lt comparion.
Note that an empty array also matches the condition, you can change the generic "missions" exists condition key into "missions.0": {$exists: true}, this will also require at least one mission.
You are unable to get the correct result as it is not the correct way to iterate the element in an array field.
Instead, you need to work with $size operator to get the size of an array and the $filter operator to filter the document.
Updated: You can directly compare the filtered array with the original array.
db.employees.aggregate([
{
$match: {
"missions": {
$exists: true
}
}
},
{
$project: {
name: 1,
nbMissionsChicago: {
$eq: [
{
$filter: {
input: "$missions",
cond: {
$eq: [
"$$this.location",
"Chicago"
]
}
}
},
"$missions"
]
}
}
}
])
Demo # Mongo Playground

MongoDB: nested array count + original document

I have the following document structure which contains an array of votes:
{ _id: ObjectId("6350e2c1a15e0e656f4a7472"),
category: 'business',
votes:
[ { voteType: 'like',
userId: ObjectId("62314007da34df3f32f7cfc0") },
{ voteType: 'like',
userId: ObjectId("6356b5cbe2272ebf628451b") } ] }
What I would like to achieve is to add for each document the sum of votes for which voteType = like, while keeping the original document, such as:
[ [{ _id: ObjectId("6350e2c1a15e0e656f4a7472"),
category: 'business',
votes:
[ { voteType: 'like',
userId: ObjectId("62314007da34df3f32f7cfc0") },
{ voteType: 'like',
userId: ObjectId("6356b5cbe2272ebf628451b") } ] }, {sum: 2, voteType: "like"} ], ...]
At the moment, the only workaround that I found is through an aggregation although I cannot manage to keep the original documents in the results:
db.getCollection('MyDocument') .aggregate([ {
$unwind: "$votes" }, {
$match: {
"votes.voteType": "like",
} }, {
$group: {
_id: {
name: "$_id",
type: "$votes.voteType"
},
count: {
$sum: 1
}
} },
{ $sort : { "count" : -1 } }, {$limit : 5}
])
which gives me:
{ _id: { name: ObjectId("635004f1b96e494947caaa5e"), type: 'like' },
count: 3 }
{ _id: { name: ObjectId("63500456b96e494947cbd448"), type: 'like' },
count: 3 }
{ _id: { name: ObjectId("63500353b6c7eb0a01df268e"), type: 'like' },
count: 2 }
{ _id: { name: ObjectId("634e315bb7d17339f8077c39"), type: 'like' },
count: 1 }
You can do it like this:
$cond with $isArray - to check if the votes property is of the type array.
$filter - to filter votes based on voteType property.
$size - to get the sized of the filtered array.
db.collection.aggregate([
{
"$set": {
"count": {
"$cond": {
"if": {
"$isArray": "$votes"
},
"then": {
"$size": {
"$filter": {
"input": "$votes",
"cond": {
"$eq": [
"$$this.voteType",
"like"
]
}
}
}
},
"else": 0
}
}
}
}
])
Working example

Get number of followers for specific page in Mongodb

let's say I have a collection called pages as
{
_id: "pageid",
name: "Mongodb"
},
{
_id: "pageid2",
name: "Nodejs"
}
and user collection as follows
{
_id : "userid1",
following: ["pageid"],
...
},
{
_id : "userid2",
following: ["pageid", "pageid2"],
...
}
how could I make a query to retrieve the pages information along with the number of users follow each page in mongodb, expected result as follows
[
{
_id: "pageid",
name: "MongoDB",
followers: 2
},
{
_id: "pageid2",
name: "Nodejs",
followers: 1
},
]
You can use $lookup and $size to count total followers,
db.pages.aggregate([
{
$lookup: {
from: "user",
localField: "_id",
foreignField: "following",
as: "followers"
}
},
{
$addFields: {
followers: { $size: "$followers" }
}
}
])
Playground

Lookup VS Lookup with pipeline MongoDB (Performace & How it internally works)

I'm making a blog and have an query about which would give me better performace, simple lookup or lookup with pipeline because sometime simple lookup gave me fast result and sometime pipleline lookup. So, I am bit confused now which one to use or where to use. Suppose I have 2 collection, user and comment collection.
// Users Collection
{
_id: "MONGO_OBJECT_ID",
userName: "Web Alchemist"
}
// Comments Collection
{
_id: "MONGO_OBJECT_ID",
userId: "USER_MONGO_OBJECT_ID",
isActive: "YES", // YES or NO
comment: "xyz"
}
Now I want to Lookup from users collection to comments, which one would be better for this. I made two query which giving me same result.
[
{
$match: { _id: ObjectId("5d68c019c7d56410cc33b01a") }
},
{
$lookup: {
from: "comments",
as: "comments",
localField: "_id",
foreignField: "userId"
}
},
{
$unwind: "$comments"
},
{
$match: {
"comments.isActive": "YES"
}
},
{ $limit: 5},
{
_id: 1, userName: 1, comments: { _id: "$comments._id", comment: "$comments.comment"}
},
{
$group: {
_id: "$_id",
userName: { '$first': '$userName' },
comments: { $addToSet: "comments"}
}
}
]
OR
[
{
$match: { _id: ObjectId("5d68c019c7d56410cc33b01a") }
},
{
$lookup: {
from: "comments",
as: "comments",
let: { userId: "$_id" },
pipeline: [
{
$match: {
$expr: {
$and: [
{ $eq: ['$userId', '$$userId'] },
{ $eq: ['$isActive', 'YES'] }
]
}
}
},
{ limit: 5 },
{
$project: { _id: 1, comment: 1 }
}
]
}
}
]

Hierachically flatten MongoDB collection of documents with arrays into documents

Block model (which goes on block 0 -> block 1 -> block 2 -> block 3 -> […]):
Example input document [700+ of these in the modulestore.structures collection]:
{
_id: ObjectId('5932d50ff8f46c0a8098ab79'),
blocks: [
{
definition: ObjectId('5923556ef8f46c0a787e9c0f'),
block_type: 'chapter',
block_id: '5b053a7f10ba41df85a3221c3ef3956e',
fields: {
format: 'Foo exam',
children: [
[
'sequential',
'9f1e58553ad448818ec8e7915d3d94d3'
],
[
'sequential',
'f052c7aa44274769a4631e95405834e0'
]
]
}
},
{
definition: ObjectId('59235569f8f46c0a7be1debc'),
block_type: 'sequential',
block_id: '9f1e58553ad448818ec8e7915d3d94d3',
fields: {
display_name: 'FooBar'
}
},
{
definition: ObjectId('59317406f8f46c0a8098aaf5'),
block_type: 'sequential',
block_id: 'f052c7aa44274769a4631e95405834e0',
fields: {
display_name: 'CanHaz'
}
}
]
}
My goal here is to:
flatten out the blocks so all blocks are at the collection level;
cursor the children array for traversal;
walk and amend the 'tree' such that each child/grandchild/great-grandchild/*-child gets a new property top_ancestor_fields containing the fields property from their topmost ancestor.
Example output:
[
{
_id: ObjectId('5a00f611f995363c2b63c9a6'),
block_type: 'chapter',
block_id: '5b053a7f10ba41df85a3221c3ef3956e',
fields: {
format: 'Foo exam'
children: [
[
'sequential',
'9f1e58553ad448818ec8e7915d3d94d3'
],
[
'sequential',
'f052c7aa44274769a4631e95405834e0'
]
]
},
top_ancestor_fields: {
format: 'Foo exam'
}
},
{
_id: ObjectId('5a00f611f995363c2b63c9a7'),
block_id: '9f1e58553ad448818ec8e7915d3d94d3',
block_type: 'sequential',
fields: {
display_name: 'FooBar'
},
top_ancestor_fields: {
format: 'Foo exam'
}
},
{
_id: ObjectId('5a00f611f995363c2b63c9a8'),
block_id: 'f052c7aa44274769a4631e95405834e0',
block_type: 'sequential',
fields: {
display_name: 'CanHaz'
},
top_ancestor_fields: {
format: 'Foo exam'
}
},
]
Almost have it working based off #neil-lunn's suggestion:
db.modulestore.structures.aggregate([
{ $unwind: '$blocks' },
{ $project: { _id: 0,
block_id: '$blocks.block_id',
children: '$blocks.fields.children',
display_name: '$blocks.fields.display_name',
block_type: '$blocks.block_type',
exam: '$blocks.fields.format',
fields: '$blocks.fields'
}},
{ $out: 'modulestore.mapped0' }
])
db.modulestore.mapped0.aggregate([
{ $graphLookup: {
from: 'modulestore.mapped0',
startWith: '$block_id',
connectToField: 'children',
connectFromField: 'block_id',
as: 'block_ids',
maxDepth: 0
} },
{ $unwind: '$block_ids' },
{ $project: {
name: 1,
_id: 0,
ancestor: '$block_ids.block_id'
} },
{ $out: 'modulestore.mapped1' }
]);
But this just hangs. I've tried configuring maxDepth $graphLookup option. FYI: db.modulestore.mapped0.count() is 80772 for me.
Each document potentially contains a children array with up to 180 elements.
Not sure how to approach this larger pipeline to map children hierarchies…
The following should get you started:
db.modulestore.structures.aggregate([{
$unwind: '$blocks' // flatten "blocks" array
}, {
$replaceRoot: { // move "blocks" field to top level
newRoot: "$blocks"
}
}, {
$unwind: { // flatten "fields.children" array
path: "$fields.children",
preserveNullAndEmptyArrays: true
}
}, {
// this step is technically not needed but it might speed up things - try running with and without that
$addFields: { // we only keep the second (last, really) entry of all your arrays since this is the only valid join key for the graphLookup
"fields.children": {
$slice: [ "$fields.children", -1 ]
}
}
}, {
$unwind: { // flatten "fields.children" array one more time because it was nested before
path: "$fields.children",
preserveNullAndEmptyArrays: true
}
}, {
$group: { // reduce the number of lookups required later by eliminating duplicate parent-child paths
"_id": "$block_id",
"block_type": { $first: "$block_type" },
"definition": { $first: "$definition" },
"fieldsFormat": { $first: "$fields.format" },
"fieldsChildren": { $addToSet: "$fields.children" }
}
}, {
$project: { // restore original structure
"block_id": "$_id",
"block_type": "$block_type",
"definition": "$definition",
"fields": {
"format": "$fieldsFormat",
"children": "$fieldsChildren"
}
}
}, { // spit out the result into "modulestore.mapped0" collection, overwriting all existing content
$out: 'modulestore.mapped0'
}])
and then
db.modulestore.mapped0.aggregate([{
$graphLookup: {
from: 'modulestore.mapped0',
startWith: '$block_id',
connectToField: 'fields.children',
connectFromField: 'block_id',
as: 'block_ids',
maxDepth: 0
}
}, {
$lookup: {
from: 'modulestore.mapped0',
localField: 'block_ids.fields.children',
foreignField: '_id',
as: 'block_ids.fields.children'
}
}])
Partial solution [gist]:
def update_descendants(modulestore, blocks, ancestor_fields):
"""
:keyword modulestore: modulestore containing the blocks
:type modulestore: ``Collection``
:keyword blocks: iterator over the blocks (collections within modulestore)
:type blocks: ``Cursor`` | `tuple`
:keyword ancestor_fields: fields of the top most ancestor
:type ancestor_fields: ``dict``
"""
for block in blocks:
modulestore.replace_one({'block_id': block['block_id'],
'block_type': block['block_type']},
update_d(block, add={'ancestor_fields': ancestor_fields},
rm=('_id',)))
update_descendants.counter += 1
print 'Updated:', update_descendants.counter
if 'children' in block and block['children']:
for block_type, block_id in block['children']:
update_descendants(modulestore,
modulestore.find({'block_id': block_id,
'block_type': block_type,
'ancestor_fields': {
'$exists': False
}}),
ancestor_fields)
Would prefer a solution that's wholly in the database though, and without all these inefficient queries.

Resources