Looping through array to count in mongodb/mongoose - arrays

I have a user schema that contains a value called amputationInfo:
amputationInfo: [
{
type: String,
},
],
Here is an example of what that might look like in the database:
amputationInfo: [
"Double Symes/Boyd",
"Single Above-Elbow"
]
I have a review Schema that allows a user to leave a review, it contains a reference to the user who left it:
user: {
type: mongoose.Schema.ObjectId,
ref: 'User',
require: [true, 'Each review must have an associated user!'],
},
When a user leaves a review, I want to create an aggregate function that looks up the user on the review, finds their amputationInfo, loops through the array and adds up the total amount of users that contain "Double Symes/Boyd", "Single Above-Elbow"
So if we have 3 users and their amputationInfo is as follows:
amputationInfo: [
"Double Symes/Boyd",
"Single Above-Elbow"
]
amputationInfo: [
"Single Above-Elbow"
]
amputationInfo: []
The return from the aggregate function will count each term and add one to the corresponding value and look something like this:
[
{
doubleSymesBoyd: 1,
singleAboveElbow: 2
}
]
Here is what I have tried, but I just don't know enough about mongoDB to solve the issue:
[
{
'$match': {
'prosthetistID': new ObjectId('6126ca6148f34c00189f86f5')
}
}, {
'$lookup': {
'from': 'users',
'localField': 'user',
'foreignField': '_id',
'as': 'userInfo'
}
}, {
'$unwind': {
'path': '$userInfo'
}
}
]
After the $unwind, the resulting object has a userInfo key, that contains an amputationInfo array nested:

You can have following stages
$unwind to deconstruct the array
first $group to get the sum of each category
second $group to push into one document and make it as key value pair
$arrayToObject to get the desired output
$replaceRoot to make the data output into root
Here is the code
db.collection.aggregate([
{ "$unwind": "$userInfo.amputationInfo" },
{
"$group": {
"_id": "$userInfo.amputationInfo",
"count": { "$sum": 1 }
}
},
{
$group: {
_id: null,
data: { $push: {
k: "$_id",
v: "$count"
}
}
}
},
{ $project: { data: { "$arrayToObject": "$data" } } },
{ "$replaceRoot": { "newRoot": "$data" } }
])
Working Mongo playground

Related

MongoDB Aggregation: How to return only the values that don't exist in all documents

Lets say I have an array ['123', '456', '789']
I want to Aggregate and look through every document with the field books and only return the values that are NOT in any documents. For example if '123' is in a document, and '456' is, but '789' is not, it would return an array with ['789'] as it's not included in any books fields in any document.
.aggregate( [
{
$match: {
books: {
$in: ['123', '456', '789']
}
}
},
I don't want the documents returned, but just the actual values that are not in any documents.
Here's one way to scan the entire collection to look for missing book values.
db.collection.aggregate([
{ // "explode" books array to docs with individual book values
"$unwind": "$books"
},
{ // scan entire collection creating set of book values
"$group": {
"_id": null,
"allBooksSet": {
"$addToSet": "$books" // <-- generate set of book values
}
}
},
{
"$project": {
"_id": 0, // don't need this anymore
"missing": { // use $setDifference to find missing values
"$setDifference": [
[ "123", "456", "789" ], // <-- your values go here
"$allBooksSet" // <-- the entire collection's set of book values
]
}
}
}
])
Example output:
[
{
"missing": [ "789" ]
}
]
Try it on mongoplayground.net.
Based on #rickhg12hs's answer, there is another variation replacing $unwind with $reduce, which considered less costly. Two out of Three steps are the same:
db.collection.aggregate([
{
$group: {
_id: null,
allBooks: {$push: "$books"}
}
},
{
$project: {
_id: 0,
allBooksSet: {
$reduce: {
input: "$allBooks",
initialValue: [],
in: {$setUnion: ["$$value", "$$this"]}
}
}
}
},
{
$project: {
missing: {
$setDifference: [["123","456", "789"], "$allBooksSet"]
}
}
}
])
Try it on mongoplayground.net.

How to push a new element into existing array or create one if it doesn't exist yet in MongoDb?

I have a script creating a document, updating it and cleaning up.
db.getCollection('things').insert( { _id: 1001,
elemo: { a: "A", b: "B" },
histo: [ ] } } )
db.getCollection('things').update( { _id: 1001 },
[ { $set: {
histo: { $concatArrays: [ "$histo", ["$elemo"] ] } } } ] )
db.getCollection("things").find({ _id: 1001})
db.getCollection('things').remove({ _id: 1001 })
For certain reasons, I'd like to retain the functionality but can't guarantee that the originally empty array actually exists. I need to perform my update in such a way so that an existing array will get an additional element, while a non-existing (yet) one will get created (including said element).
db.getCollection('things').insert( { _id: 1001,
elemo: { a: "A", b: "B" } } )
db.getCollection('things').update( { _id: 1001 },
[ { $set: {
histo: { $concatArrays: [ "$histo", ["$elemo"] ] } } } ] )
db.getCollection("things").find({ _id: 1001})
db.getCollection('things').remove({ _id: 1001 })
The above only creates the field but its value is null, and so additional amendments to it result in null. I'm rather certain that it needs something more around $concatArrays but I can't figure out what. First, I thought I could go $ifnull but it didn't recognize that command (no error, no insertion, no coalescing, nothing).
You can make use of $cond or $ifNull (as you guessed) to check if the key exists or not inside the $concatArrays operator.
Using $cond Method
db.collection.update({
_id: 1001
},
[
{
$set: {
histo: {
"$concatArrays": [
{
"$cond": {
"if": {
"$not": [
"$histo"
]
},
"then": [],
"else": "$histo",
}
},
[
"$elemo"
],
],
}
}
}
])
Mongo Playground Sample Execution
Using $ifNull Method
db.collection.update({
_id: 1001
},
[
{
$set: {
histo: {
"$concatArrays": [
{
"$ifNull": [
"$histo",
[]
],
},
[
"$elemo"
],
],
}
}
}
])
Mongo Playground Sample Execution

MongoDB: How to copy Documents to a new field of associated Documents from other collections?

Collections that I have:
Product:
[
{
"_id":"product_id_1",
"name":"Product 1",
"price":50
},
{
"_id":"product_id_2",
"name":"Product 2",
"price":100
}
]
Category:
[
{
"_id":"category_id_1",
"name":"Category 1"
},
{
"_id":"category_id_2",
"name":"Category 2"
}
]
Audit:
[
{
"_id":"audit_id_1",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_2",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-09T00:00:00.000Z"
},
{
"_id":"audit_id_3",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"price",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_4",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_5",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-09T00:00:00.000Z"
}
]
Collection Audit is using for saving details about each Product or Category documents updates.
For example, we see that the attribute name of Product with id product_id_1 was changed twice:
9th of January and 10th of January.
attribute price of the same Product was changed only once: 10th of January.
The same kind of information saved for Category collection as well.
The goal that I want to achieve is:
Extract existing Documents from Audit collection that contain information only about the latest changes for each unique attribute per each unique resource and copy them to a new field of related document of Product/Category collections.
As result, the Product/Category collections should look like this:
Product:
[
{
"_id":"product_id_1",
"name":"Product 1",
"price":50,
"audit":[
{
"_id":"audit_id_1",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
},
{
"_id":"audit_id_3",
"resource_type":"product",
"resource_id":"product_id_1",
"attribute":"price",
"executionTime":"2021-01-10T00:00:00.000Z"
}
]
},
{
"_id":"product_id_2",
"name":"Product 2",
"price":100,
"audit":[
]
}
]
Category:
[
{
"_id":"category_id_1",
"name":"Category 1",
"audit":[
{
"_id":"audit_id_4",
"resource_type":"category",
"resource_id":"category_id_1",
"attribute":"name",
"executionTime":"2021-01-10T00:00:00.000Z"
}
]
},
{
"_id":"category_id_2",
"name":"Category 2",
"audit":[
]
}
]
I tried to write a query by myself, and this is what I got:
db.getCollection("audit").aggregate([
{
$match: {
"resource_type": "product"}
},
{
$sort: {
executionTime: -1
}
},
{
$group: {
_id: {
property: "$attribute",
entity: "$resource_id"
},
document: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$document"
}
}
]).forEach(function(a){
db.getCollection("product").update({"_id" :ObjectId(a.resource_id)},{addToSet : {audit:[a]}})
});
The problems that I see with my solution are:
it will update only one Product collection. It means that I need to execute my code at list twice, for each existing collections.
forEach statement, I am not sure where exactly this command executed on the server-side or on client-side, assume Audit collection contains approx 100k documents, from the performance point of view, how fast this command will be executed?
So, definitely, I have a feeling that I need to rewrite my solution, but I have doubts about how to make it better.
For example, I read about $merge command, which can do a quite similar job that I do in forEach section, but I do not know how to apply $merge in the aggregation flow that I wrote above properly.
First of all forEach is executed on the client side, which means you download result of the aggregation and make 1 update request per each document in the result. Although it is the most flexible way it is the most expensive one. Aggregation pipeline with $out and $merge on the other hand is executed on the serverside so you don't pipe data through the client.
Secondly, if you need to update 2 collections you will need at least 2 queries. There is no way to $out to multiple collections.
Finally, you need to use the subquery syntax of the $lookup. It is more flexible and let you define "joining" logic in pipeline terms. For products it would be:
db.products.aggregate([
{
$lookup: {
from: "audit",
let: {
id: "$_id"
},
pipeline: [
{ "$match": {
$expr: { $eq: [ "$resource_id", "$$id" ] }, // the foreign key match
resource_type: "product" // the discriminator
} },
{ $sort: { "executionTime": -1 } }, // chronological order
{ "$group": {
_id: {
attribute: "$attribute", // for each unique attribute
id: "$resource_id" // per each unique resource
},
value: {
$first: "$$ROOT" // pick the latest
}
} },
{ "$replaceRoot": { "newRoot": "$value" } }
],
as: "audit"
}
}
])
The $out stage and its limitations you already learned from the previous answer.
The second pipeline to update categories will be exactly the same but with another $out destination and another value in the discriminator.
want to post the code written by myself:
db.getCollection("product").aggregate([
{ $match: {} },
{
$lookup: {
from: 'audit',
localField: '_id',
foreignField: 'resource_id',
as: 'audit'
}
},
{
$unwind: '$audit'
},
{
$sort: { "audit.executionTime": -1 }
},
{
$group: {
_id: {
property: "$audit.attribute",
entity: "$audit.resource_id"
},
document: {
$first: "$$ROOT"
}
}
},
{
$replaceRoot: {
newRoot: "$document"
}
},
{
$group: {
_id: "$_id",
audit: { $push: "$audit" }
}
},
{
$merge: {
into: 'product',
on: "_id",
whenMatched: 'merge',
whenNotMatched: 'insert'
}
}])

Can't reduce a deeply nested array on MongoDB

I've a Mongo database with documents like these one inside a collection:
{
date:"2019-06-12T00:09:03.000Z",
actions:{
actionDate:"2019-06-12T00:15:25.000Z",
data:{
users:[
[{gender:"Male",age:24},
{gender:"Female",age:25}
],
[{gender:"Male",age:34},
{gender:"Male",age:26}
],
[{gender:"Female",age:19},
{gender:"Male",age:21}
]
]
}
}
}
I would like to summarize the users appearing inside the array users in a single document, like
{
"date":"2019-06-12T00:09:03.000Z",
"actionDate":"2019-06-12T00:15:25.000Z",
"summary":{
"countFemale":2,
"meanFemaleAge":22,
"countMale":4,
"meanMaleAge":26.25
}
}
Some considerations to be taken into account: there could be no cases for one gender and also, the users array might be limited to one or two arrays inside it.
I've tried to solve it using my, now I know, scarce knowledge of Mongo query language but it seems unsolvable to me. Thought this might be useful checking MongoDB: Reduce array of objects into a single object by computing the average of each field but can't catch up with the idea.
Any ideas, please?
Try below query :
db.collection.aggregate([
/** Merge all arrays inside 'users' & push to 'summary' field */
{
$project: {
date: 1,
actionDate: "$actions.actionDate",
summary: {
$reduce: {
input: "$actions.data.users",
initialValue: [],
in: { $concatArrays: ["$$value", "$$this"] },
},
},
},
},
{
$unwind: "$summary",
},
/** Group on 'date' to push data related to same date */
{
$group: {
_id: "$date",
actionDate: {$first: "$actionDate",},
countFemale: {$sum: {$cond: [{$eq: ["$summary.gender", "Female"]},1,0]}},
countMale: {$sum: {$cond: [{$eq: ["$summary.gender", "Male"]},1,0]}},
meanFemaleAge: {$sum: {$cond: [{$eq: ["$summary.gender", "Female"]},"$summary.age",0]}},
meanMaleAge: {$sum: {$cond: [{$eq: ["$summary.gender", "Male"]},"$summary.age",0]}}
}
},
/** Re-create 'meanFemaleAge' & 'meanMaleAge' fields to add mean */
{
$addFields: {
meanFemaleAge: {$cond: [{$ne: ["$meanFemaleAge", 0]},{$divide: ["$meanFemaleAge","$countFemale"]},0]},
meanMaleAge: {$cond: [{$ne: ["$meanMaleAge", 0]},{$divide: ["$meanMaleAge","$countMale"]},0]},
}
}
]);
Test : MongoDB-Playground
Note : No matter what how you do this, I would suggest you to do not implement this kind of operations on entire collection with huge datasets.
We need to perform $reduce operator.
In the first stage, we create separate arrays (Male|Female) and push users according to their gender.
In the second stage, we transform / calculate result.
Try this one:
db.collection.aggregate([
{
$addFields: {
"users": {
$reduce: {
input: "$actions.data.users",
initialValue: {
"Male": [],
"Female": []
},
in: {
Male: {
$concatArrays: [
"$$value.Male",
{
$filter: {
input: "$$this",
cond: {
$eq: [
"$$this.gender",
"Male"
]
}
}
}
]
},
Female: {
$concatArrays: [
"$$value.Female",
{
$filter: {
input: "$$this",
cond: {
$eq: [
"$$this.gender",
"Female"
]
}
}
}
]
}
}
}
}
}
},
{
$project: {
_id: 0,
date: 1,
actionDate: "$actions.actionDate",
summary: {
"countFemale": {
$size: "$users.Female"
},
"meanFemaleAge": {
$avg: "$users.Female.age"
},
"countMale": {
$size: "$users.Male"
},
"meanMaleAge": {
$avg: "$users.Male.age"
}
}
}
}
])
MongoPlayground

How to use $text search inside $lookup pipeline

I have the next collection for exaple:
// vehicles collection
[
{
"_id": 321,
manufactor: SOME-OBJECT-ID
},
{
"_id": 123,
manufactor: ANOTHER-OBJECT-ID
},
]
And I have a collection named tables:
// tables collection
[
{
"_id": SOME-OBJECT-ID,
title: "Skoda"
},
{
"_id": ANOTHER-OBJECT-ID,
title: "Mercedes"
},
]
As you can see, the vehicles collection's documents are pulling data from the
tables's collection ducments - the first document in the vehicles collection has a manufactor
id which is getting pulled from the tables collection and named Skoda.
That is great.
When I am querying the DB using aggregate I can able to easily pull the remote data from the remote collections
respectively - without any problem.
I can also easily make rules and limitations like $project, $sort, $skip, $limit and others.
But I want to display to the user only those vehicles that are manufcatord by Mercedes.
Since Mercedes is not mentioned in the vehicles collection, but only its ID, the $text $search would not
return with the right results.
This is the aggregate pipeline that I provide:
[
{
$match: {
$text: {
$search: "Mercedes"
}
}
},
{
$lookup: {
from: "tables",
let: {
manufactor: "$manufactor"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$manufactor"
]
}
}
},
{
$project: {
title: 1
}
}
],
as: "manufactor"
},
},
{
$unwind: "$manufactor"
},
{
$lookup: {
from: "tables",
let: {
model: "$model"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$model"
]
}
}
},
{
$project: {
title: 1
}
}
],
as: "model"
},
},
{
$unwind: "$model"
},
{
$lookup: {
from: "users",
let: {
joined_by: "$_joined_by"
},
pipeline: [
{
$match: {
$expr: {
$eq: [
"$_id", "$$joined_by"
]
}
}
},
{
$project: {
personal_info: 1
}
}
],
as: "joined_by"
},
},
{
$unwind: "$joined_by"
}
]
As you can see I am using the $text and $search $match at the first stage in the pipleline - otherwise
MongoDB will throw an error.
But this $text $search object searhed only in the origin collection - the vehicles collection.
Is there a way to tell MongoDB to search in the remote collection with the $text and $search method
and then put in the aggregate only results that are matching both?
UPDATE
When I am doing this instead:
{
$lookup: {
from: "tables",
pipeline: [
{
$match: {
$text: {
$search: "Mercedes"
}
}
},
{
$project: {
title: 1
}
}
],
as: "manufactor"
},
},
This is what I receive:
MongoError: pipeline requires text score metadata, but there is no text score available
if you are using one of the affected versions in this thread, you need to update your mongodb server.
As you can see the issue was fixed in version 4.1.8

Resources