Inspired by another question I was looking for a common way to couple items in a nested array, so the 1st item will be coupled with the 2nd item, and the 3rd item will be coupled with the 4th item.
Assuming my document looks like:
{
_id: ObjectId("5a934e000102030405000000"),
events: [
{
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
},
{
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
}
And I want to couple the items:
{
_id: ObjectId("5a934e000102030405000000"),
couples: [
[
{
mod: 0,
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
mod: 1,
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
}
],
[
{
mod: 0,
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
mod: 1,
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
]
}
Since mongoDB version 4.4*, One option is to use an aggregation pipeline with $reduce, $mod, $filter and $zip:
$reduce with $mod to add a new mod field to each item, with value 0 to each odd index (1, 3, 5,...) and value 1 to each even index (2, 4, 6,...)
$fiter into two arrays according to the mod value
$zip these two arrays to one array of couples
db.collection.aggregate([
{
$project: {
events: {
$reduce: {
input: "$events",
initialValue: [],
in: {$concatArrays: [
"$$value",
[
{
timestamp: "$$this.timestamp",
status: "$$this.status",
mod: {$mod: [{$size: "$$value"}, 2]}
}
]
]
}
}
}
}
},
{
$project: {
firstEvent: {$filter: {input: "$events", cond: {$eq: ["$$this.mod", 0]}}},
secondEvent: {$filter: {input: "$events", cond: {$eq: ["$$this.mod", 1]}}}
}
},
{$project: {couples: {$zip: {inputs: ["$firstEvent", "$secondEvent"]}}}}
])
See how it works on the playground example
*With older mongoDB versions, 3.4 or higher, the $mod can be replaces with a "manual" mod calculation.
Related
I have a database of a the employees of a company that looks like this:
{
_id: 7698,
name: 'Blake',
job: 'manager',
manager: 7839,
hired: ISODate("1981-05-01T00:00:00.000Z"),
salary: 2850,
department: {name: 'Sales', location: 'Chicago'},
missions: [
{company: 'Mac Donald', location: 'Chicago'},
{company: 'IBM', location: 'Chicago'}
]
}
I have an exercise in which I need to write the MongoDb command that returns all them employees who did all their missions in Chicago. I struggle with the all because I cannot find a way to check that all the locations of the missions array are equal to 'Chicago'.
I was thinking about doing it in two time: first find the total number of missions the employee has and then compare it to the number of mission he has in Chicago (that how I would do in SQL I guess). But I cannot found the number of mission the employee did in Chicago. Here is what I tried:
db.employees.aggregate([
{
$match: { "missions": { $exists: true } }
},
{
$project: {
name: 1,
nbMissionsChicago: {
$sum: {
$cond: [
{
$eq: [{
$getField: {
field: { $literal: "$location" },
input: "$missions"
}
}, "Chicago"]
}, 1, 0
]
}
}
}
}
])
Here is the result :
{ _id: 7698, name: 'Blake', nbMissionsChicago: 0 }
{ _id: 7782, name: 'Clark', nbMissionsChicago: 0 }
{ _id: 8000, name: 'Smith', nbMissionsChicago: 0 }
{ _id: 7902, name: 'Ford', nbMissionsChicago: 0 }
{ _id: 7499, name: 'Allen', nbMissionsChicago: 0 }
{ _id: 7654, name: 'Martin', nbMissionsChicago: 0 }
{ _id: 7900, name: 'James', nbMissionsChicago: 0 }
{ _id: 7369, name: 'Smith', nbMissionsChicago: 0 }
First of all, is there a better method to check that all the locations of the missions array respect the condition? And why does this commands returns only 0 ?
Thanks!
If all you need is the agents who had all their missions in "Chicago" then you don't need an aggregation pipeline for it, specifically the approach of filtering the array as part of the aggregation can't utilize an index and will make performance even worse.
A simple query should suffice here:
db.collection.find({
$and: [
{
"missions": {
$exists: true
}
},
{
"missions.location": {
$not: {
$gt: "Chicago"
}
}
},
{
"missions.location": {
$not: {
$lt: "Chicago"
}
}
}
]
})
Mongo Playground
This way we can build an index on the missions field and utilize it properly, any documents with a different value other then "Chigaco" will not match as they will fail the $gt or $lt comparion.
Note that an empty array also matches the condition, you can change the generic "missions" exists condition key into "missions.0": {$exists: true}, this will also require at least one mission.
You are unable to get the correct result as it is not the correct way to iterate the element in an array field.
Instead, you need to work with $size operator to get the size of an array and the $filter operator to filter the document.
Updated: You can directly compare the filtered array with the original array.
db.employees.aggregate([
{
$match: {
"missions": {
$exists: true
}
}
},
{
$project: {
name: 1,
nbMissionsChicago: {
$eq: [
{
$filter: {
input: "$missions",
cond: {
$eq: [
"$$this.location",
"Chicago"
]
}
}
},
"$missions"
]
}
}
}
])
Demo # Mongo Playground
I'm starting to learn Aggregate in MongoDB. I have a simple Doc as below, which has 2 fields, name and examScores, examScores is an array contains multiplier documents:
{ _id: ObjectId("633199db009be219a43ae426"),
name: 'Max',
examScores:
[ { difficulty: 4, score: 57.9 },
{ difficulty: 6, score: 62.1 },
{ difficulty: 3, score: 88.5 } ] }
{ _id: ObjectId("633199db009be219a43ae427"),
name: 'Manu',
examScores:
[ { difficulty: 7, score: 52.1 },
{ difficulty: 2, score: 74.3 },
{ difficulty: 5, score: 53.1 } ] }
Now I query the maximum score of each person using $unwind and $group/$max as below:
db.test.aggregate([
{$unwind: "$examScores"},
{$group: {_id: {name: "$name"}, maxScore: {$max: "$examScores.score"}}}
])
{ _id: { name: 'Max' }, maxScore: 88.5 }
{ _id: { name: 'Manu' }, maxScore: 74.3 }
But I want the result also contains the examScores.difficulty field corresponding to name and examScores.score, like below:
{ _id: { name: 'Max' }, difficulty: 3, maxScore: 88.5 }
{ _id: { name: 'Manu' }, difficulty: 2, maxScore: 74.3 }
I know that I can use $sort + $group and $first to achieve this goal. But I want to use $getField or any other methods to get data from ROOT Doc.
My idea is use $project and $getField to get the difficulty field from ROOT doc (or $unwind version of ROOT doc) with the condition like ROOT.name = Aggregate.name and Root.examScores.score = Aggregate.maxScore.
It will look something like this:
{$project:
{name: 1,
maxScore: 1,
difficulty:
{$getField: {
field: "$examScores.difficulty"
input: "$$ROOT.$unwind() with condition/filter"}
}
}
}
I wonder if this is possible in MongoDB?
Solution 1
$unwind
$group - Group by name. You need $push to add the $$ROOT document into data array.
$project - Set the difficulty field by getting the value of examScores.difficulty from the first item of the filtered data array by matching the examScores.score with maxScore.
db.collection.aggregate([
{
$unwind: "$examScores"
},
{
$group: {
_id: {
name: "$name"
},
maxScore: {
$max: "$examScores.score"
},
data: {
$push: "$$ROOT"
}
}
},
{
$project: {
_id: 0,
name: "$_id.name",
maxScore: 1,
difficulty: {
$getField: {
field: "difficulty",
input: {
$getField: {
field: "examScores",
input: {
$first: {
$filter: {
input: "$data",
cond: {
$eq: [
"$$this.examScores.score",
"$maxScore"
]
}
}
}
}
}
}
}
}
}
}
])
Demo Solution 1 # Mongo Playground
Solution 2: $rank
$unwind
$rank - Ranking by partition name and sort examScores.score descending.
$match - Filter the document with { rank: 1 }.
$unset - Remove rank field.
db.collection.aggregate([
{
$unwind: "$examScores"
},
{
$setWindowFields: {
partitionBy: "$name",
sortBy: {
"examScores.score": -1
},
output: {
rank: {
$rank: {}
}
}
}
},
{
$match: {
rank: 1
}
},
{
$unset: "rank"
}
])
Demo Solution 2 # Mongo Playground
Opinion: I would say this approach:
$sort by examScores.score descending
$group by name, take the first document
would be much easier.
There's no need to $unwind and then rebuild the documents again via $group to achieve your desired results. I'd recommend avoiding that altogether.
Instead, consider processing the arrays inline using array expression operators. Depending on the version and exact results you are looking for, here are two starting points that may be worth considering. In particular the $maxN operator and the $sortArray operator may be of interest for this particular question.
You can get a sense for what these two operators do by running an $addFields aggregation to see their output, playground here.
With those as a starting point, it's really up to you to make the pipeline output the desired result. Here is one such example that matches the output you described in the question pretty well (playground):
db.collection.aggregate([
{
"$addFields": {
"relevantEntry": {
$first: {
$sortArray: {
input: "$examScores",
sortBy: {
"score": -1
}
}
}
}
},
},
{
"$project": {
_id: 0,
name: 1,
difficulty: "$relevantEntry.difficulty",
maxScore: "$relevantEntry.score"
}
}
])
Which yields:
[
{
"difficulty": 3,
"maxScore": 88.5,
"name": "Max"
},
{
"difficulty": 2,
"maxScore": 74.3,
"name": "Manu"
}
]
Also worth noting that this particular approach doesn't do anything special if there are duplicates. You could look into using $filter if something more was needed in that regard.
Inspired by another question I was looking for a common way to add a field with the index to each item in a nested array.
Assuming my document looks like:
{
_id: ObjectId("5a934e000102030405000000"),
events: [
{
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
},
{
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
}
And I want each item to contain a new field which is the index of the item in the array:
{
_id: ObjectId("5a934e000102030405000000"),
events: [
{
arrayIndex: 0,
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
arrayIndex: 1,
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
},
{
arrayIndex: 2,
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
arrayIndex: 3,
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
}
Since mongoDB version 3.4, this can be done using an aggregation pipeline with a $reduce phase, which uses the size of the new accumulated array:
db.collection.aggregate([
{$project: {
events: {
$reduce: {
input: "$events",
initialValue: [],
in: {
$concatArrays: [
"$$value",
[
{$mergeObjects: [
"$$this",
{arrayIndex: {$size: "$$value"}}
]}
]
]
}
}
}
}}
])
See how it works on the playground example
I am trying to write a performance report based on shapefile data I have stored within docs stored in collections.
Here is a sample of data:
The following function works quite well as it returns the amount of bytes for each document - great, however I would also like to know how many points/pairs are stored within each polygon's linear string for each document.
db.getCollection("_collectionName").aggregate([{"$project": {"object_size": { $bsonSize: "$$ROOT" }}}])
This returns the following set of data (sample):
{ _id: ObjectId("5ef7da26ae8659149c97657e"), rootSize: 42215 },
{ _id: ObjectId("5ef7da45ae8659149c97657f"), rootSize: 118574 },
{ _id: ObjectId("5ef7daf1ae8659149c976585"), rootSize: 11886 },
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), rootSize: 43136 },
{ _id: ObjectId("5ef7daa6ae8659149c976582"), rootSize: 40823 },
{ _id: ObjectId("5f3495129861ce45eb4e9728"), rootSize: 394884 },
{ _id: ObjectId("5ef7d7f6ae8659149c97657c"), rootSize: 125309 },
{ _id: ObjectId("5ef7dad6ae8659149c976584"), rootSize: 127447 },
{ _id: ObjectId("5fa56ef26538cd3bddd8389e"), rootSize: 17670 },
{ _id: ObjectId("5fa56ef26538cd3bddd8389f"), rootSize: 11398 },
{ _id: ObjectId("5fa56ef16538cd3bddd8389c"), rootSize: 2415 },
{ _id: ObjectId("5fa56ef36538cd3bddd838ae"), rootSize: 1757 },
{ _id: ObjectId("5fa56ef36538cd3bddd838b0"), rootSize: 4866 },
{ _id: ObjectId("5fa56ef36538cd3bddd838a8"), rootSize: 1510 },
{ _id: ObjectId("5fa56ef26538cd3bddd838a7"), rootSize: 39631 },
{ _id: ObjectId("5fa56ef36538cd3bddd838ab"), rootSize: 3662 },
{ _id: ObjectId("5fa56ef36538cd3bddd838aa"), rootSize: 15844 },
{ _id: ObjectId("5fa56ef16538cd3bddd8389d"), rootSize: 17196 },
{ _id: ObjectId("5fa56ef26538cd3bddd838a3"), rootSize: 34940 },
{ _id: ObjectId("5fa56ef36538cd3bddd838af"), rootSize: 468367 }
Which is great but it does not tell me how many elements are in the array/linear string within geometry.coordinates.
I have tried the following, but no cigar:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry", [] ] } } } }])
MongoServerError: The argument to $size must be an array, but was of type: object
It comes back with an error, which i understand - so i referenced the coordinates array:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry.coordinates", [] ] } } } }])
Which, returned the following data, again correct, if you understand GeoJSON files this is normal as this is the top level of the linear ring, sample data:
{ _id: ObjectId("5ef7da26ae8659149c97657e"), count: 1 }
{ _id: ObjectId("5ef7da45ae8659149c97657f"), count: 1 }
{ _id: ObjectId("5ef7daf1ae8659149c976585"), count: 1 }
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), count: 1 }
So I then added the top level array of 0 to my aggregate function:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry.coordinates.0", [] ] } } } }])
And this is what was returned:
{ _id: ObjectId("5ef7da26ae8659149c97657e"), count: 0 }
{ _id: ObjectId("5ef7da45ae8659149c97657f"), count: 0 }
{ _id: ObjectId("5ef7daf1ae8659149c976585"), count: 0 }
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), count: 0 }
And that is not possible, here is a screenshot from Studio3T software:
Anybody who might be able to help or point me in the right direction please do so....
(I would be very grateful!)
The dot notation won't work on array elements within an aggregation. You'll want to use the $arrayElemAt operator, as follows:
db.getCollection("_aGStbl").aggregate([{
$project: {
count: { $size: { $arrayElemAt: [ "$geometry.coordinates", 0 ]}}
}
}])
To cater for Null values, you can use a $cond, depending on your objective for the output:
INSERT SOME DATA INTO A TESTDB:
db.arrayTest.insertMany([
{ _id: 1, arrayOfArrays: [ [ 1, 2, 3 ], [ 1, 2, 3, 4 ], [ 1, 2, 3, 4, 5, 6, 7 ] ] },
{ _id: 2, arrayOfArrays: [ [ 4, 5 ], [ 5, 6, 7 ] ] },
{ _id: 3, arrayOfArrays: [ [], [] ] },
{ _id: 4, arrayOfArrays: [ [], [], [] ] },
{ _id: 5 }
] )
{ acknowledged: true,
insertedIds: { '0': 1, '1': 2, '2': 3, '3': 4, '4': 5 } }
TRY THESE AGGREGATE CALLS:
db.arrayTest.aggregate([{$project: { count: { $size: { "$ifNull": [ { $arrayElemAt: [ "$arrayOfArrays", 0 ] }, [ ] ] } } } } ] )
{ _id: 1, count: 3 }
{ _id: 2, count: 2 }
{ _id: 3, count: 0 }
{ _id: 4, count: 0 }
{ _id: 5, count: 0 }
db.arrayTest.aggregate([{$project: { count: { $cond: { if: {$arrayElemAt: [ "$arrayOfArrays", 0 ]}, then: { $size: { $arrayElemAt: [ "$arrayOfArrays", 0 ] } }, else: null} } } } ] )
{ _id: 1, count: 3 }
{ _id: 2, count: 2 }
{ _id: 3, count: 0 }
{ _id: 4, count: 0 }
{ _id: 5, count: null }
For example, with this data:
{id: 1, fname: "Barry", lname: "Sullivan"}
{id: 2, fname: "Sarah", lname: "Bailey"}
{id: 3, fname: "Drake", lname: "Barry"}
Is there a way, with a single query, that I could check to see if anyone had the same lname as id: 1 fname?
You can use $facet to run two separate queries and get the result as one document. This will give you two separate arrays: 1-element with id:1 and the other documents. Then you can simply run $filter to get matching lnames:
db.collection.aggregate([
{
$facet: {
first: [ { $match: { id: 1 } } ],
others: [ { $match: { $expr: { $ne: [ "$id", 1 ] } } } ]
}
},
{
$unwind: "$first"
},
{
$project: {
matches: {
$filter: {
input: "$others",
cond: { $eq: [ "$$this.lname", "$first.lname" ] }
}
}
}
}
])
Mongo Playground