I am using Bucket Pattern to divide a time series of market updates and avoid exceeding document size (transfer overhead is not a concern). Each document has 30,000 updates and I need to return those updates from all documents matching 'marketId' in an array (hopefully sorted).
Data:
{
_id: 60617172eca858909eace71f,
marketId: '1.278363651',
eventId: 5697224,
marketType: 'OVER_UNDER_15',
size: 30000,
updates: [
{
t: 1616998770482.0,
p: 36.49
},
{
t: 1616998770482,
p: 87.77
},
// ... 29998 more
]
},
Desired outcome:
[
{ t: 1616998770482, p: 36.49 },
{ t: 1616998770482, p: 16.59 },
{ t: 1616998770482, p: 40.38 },
... // sorted by t
}
This my the closest attempt with an aggregation:
const result = await db.collection('markets').aggregate(
{ $match: { marketId: marketId } },
{ $project: { _id: 0, updates: 1 } },
{ $unwind: "$updates" },
{ $unwind: "$updates" },
).toArray();
Output:
[
{ updates: { t: 1616998770482, p: 36.49 } },
{ updates: { t: 1616998770482, p: 16.59 } },
{ updates: { t: 1616998770482, p: 40.38 } },
... // actually gives me all of them
}
How can I remove the "updates" and get to the actual object?
Demo - https://mongoplayground.net/p/Dw7NAn9EoEd
Use $replaceRoot
db.collection.aggregate([
{ $match: { marketId: marketId } },
{ $project: { _id: 0, updates: 1 }},
{ $unwind: "$updates" },
{ $replaceRoot: { newRoot: "$updates" } }
])
Related
I have a collection "product_reviews" with this document structure
{
_id: 'B000000OE4',
'product/title': 'Working Class Hero',
'product/price': '16.99',
reviews: [
{
'review/userId': 'unknown',
'review/profileName': 'unknown',
'review/helpfulness': '2/3',
'review/score': '4.0',
'review/time': '27/05/1999/00:00:00',
'review/summary': 'Worth it for one song',
'review/text': "I really like Joan Baez'..."
},
{
'review/userId': 'A1W0RKM6J6J73L',
'review/profileName': 'Aaron Woodin (purchagent#aol.com)',
'review/helpfulness': '1/1',
'review/score': '3.0',
'review/time': '09/02/1999/00:00:00',
'review/summary': 'The critical lambasting on the Amazon Page Missed one thing.',
'review/text': "They forgot to mention Mary Chapin..."
},
...
]
}
My goal is to add object for each product (each product has unique _id) that will have following structure:
{
avgReviewScore: 4.5
reviewsCount: 105
reviewScoreDistrib: {
1: 15
2: 0
3: 30
4: 40
5: 20
}
}
I tried numerous aggregation pipelines but couldn't find a solution.
You can try this code:
db.product_reviews.aggregate([{
$unwind: "$reviews"
},
{
$group: {
_id: "$_id",
avgReviewScore: {
$avg: "$reviews.review/score"
},
reviewsCount: {
$sum: 1
},
scores: {
$push: "$reviews.review/score"
}
}
},
{
$project: {
avgReviewScore: 1,
reviewsCount: 1,
reviewScoreDistrib: {
$arrayToObject: {
$map: {
input: [1, 2, 3, 4, 5],
as: "num",
in: {
k: {$toString: "$$num"},
v: {
$size: {
$filter: {
input: "$scores",
as: "s",
cond: {
$eq: ["$$s", "$$num"]
}
}
}
}
}
}
}
}
}
},
{
$merge: {
into: "product_reviews",
on: "_id"
}
}
])
If you have any issue, you can ask
No need to $unwind and $group again (which can be very inefficient). You can use a simple updateMany:
db.collection.updateMany({},
[
{$set: {
reviewsData: {$map: {
input: "$reviews.review/score",
in: {$toDouble: "$$this"}
}}
}},
{$set: {
reviewScoreDistrib: {
$arrayToObject: {$map: {
input: {$range: [1, 6]},
as: "num",
in: {
k: {$toString: "$$num"},
v: {$size: {$filter: {
input: "$reviewsData",
cond: {$eq: ["$$this", "$$num"]}
}}}
}
}}
},
avgReviewScore: {$avg: "$reviewsData"},
reviewsCount: {$size: "$reviewsData"}
}}
])
See how it works on the playground example
I am trying to write a performance report based on shapefile data I have stored within docs stored in collections.
Here is a sample of data:
The following function works quite well as it returns the amount of bytes for each document - great, however I would also like to know how many points/pairs are stored within each polygon's linear string for each document.
db.getCollection("_collectionName").aggregate([{"$project": {"object_size": { $bsonSize: "$$ROOT" }}}])
This returns the following set of data (sample):
{ _id: ObjectId("5ef7da26ae8659149c97657e"), rootSize: 42215 },
{ _id: ObjectId("5ef7da45ae8659149c97657f"), rootSize: 118574 },
{ _id: ObjectId("5ef7daf1ae8659149c976585"), rootSize: 11886 },
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), rootSize: 43136 },
{ _id: ObjectId("5ef7daa6ae8659149c976582"), rootSize: 40823 },
{ _id: ObjectId("5f3495129861ce45eb4e9728"), rootSize: 394884 },
{ _id: ObjectId("5ef7d7f6ae8659149c97657c"), rootSize: 125309 },
{ _id: ObjectId("5ef7dad6ae8659149c976584"), rootSize: 127447 },
{ _id: ObjectId("5fa56ef26538cd3bddd8389e"), rootSize: 17670 },
{ _id: ObjectId("5fa56ef26538cd3bddd8389f"), rootSize: 11398 },
{ _id: ObjectId("5fa56ef16538cd3bddd8389c"), rootSize: 2415 },
{ _id: ObjectId("5fa56ef36538cd3bddd838ae"), rootSize: 1757 },
{ _id: ObjectId("5fa56ef36538cd3bddd838b0"), rootSize: 4866 },
{ _id: ObjectId("5fa56ef36538cd3bddd838a8"), rootSize: 1510 },
{ _id: ObjectId("5fa56ef26538cd3bddd838a7"), rootSize: 39631 },
{ _id: ObjectId("5fa56ef36538cd3bddd838ab"), rootSize: 3662 },
{ _id: ObjectId("5fa56ef36538cd3bddd838aa"), rootSize: 15844 },
{ _id: ObjectId("5fa56ef16538cd3bddd8389d"), rootSize: 17196 },
{ _id: ObjectId("5fa56ef26538cd3bddd838a3"), rootSize: 34940 },
{ _id: ObjectId("5fa56ef36538cd3bddd838af"), rootSize: 468367 }
Which is great but it does not tell me how many elements are in the array/linear string within geometry.coordinates.
I have tried the following, but no cigar:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry", [] ] } } } }])
MongoServerError: The argument to $size must be an array, but was of type: object
It comes back with an error, which i understand - so i referenced the coordinates array:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry.coordinates", [] ] } } } }])
Which, returned the following data, again correct, if you understand GeoJSON files this is normal as this is the top level of the linear ring, sample data:
{ _id: ObjectId("5ef7da26ae8659149c97657e"), count: 1 }
{ _id: ObjectId("5ef7da45ae8659149c97657f"), count: 1 }
{ _id: ObjectId("5ef7daf1ae8659149c976585"), count: 1 }
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), count: 1 }
So I then added the top level array of 0 to my aggregate function:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry.coordinates.0", [] ] } } } }])
And this is what was returned:
{ _id: ObjectId("5ef7da26ae8659149c97657e"), count: 0 }
{ _id: ObjectId("5ef7da45ae8659149c97657f"), count: 0 }
{ _id: ObjectId("5ef7daf1ae8659149c976585"), count: 0 }
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), count: 0 }
And that is not possible, here is a screenshot from Studio3T software:
Anybody who might be able to help or point me in the right direction please do so....
(I would be very grateful!)
The dot notation won't work on array elements within an aggregation. You'll want to use the $arrayElemAt operator, as follows:
db.getCollection("_aGStbl").aggregate([{
$project: {
count: { $size: { $arrayElemAt: [ "$geometry.coordinates", 0 ]}}
}
}])
To cater for Null values, you can use a $cond, depending on your objective for the output:
INSERT SOME DATA INTO A TESTDB:
db.arrayTest.insertMany([
{ _id: 1, arrayOfArrays: [ [ 1, 2, 3 ], [ 1, 2, 3, 4 ], [ 1, 2, 3, 4, 5, 6, 7 ] ] },
{ _id: 2, arrayOfArrays: [ [ 4, 5 ], [ 5, 6, 7 ] ] },
{ _id: 3, arrayOfArrays: [ [], [] ] },
{ _id: 4, arrayOfArrays: [ [], [], [] ] },
{ _id: 5 }
] )
{ acknowledged: true,
insertedIds: { '0': 1, '1': 2, '2': 3, '3': 4, '4': 5 } }
TRY THESE AGGREGATE CALLS:
db.arrayTest.aggregate([{$project: { count: { $size: { "$ifNull": [ { $arrayElemAt: [ "$arrayOfArrays", 0 ] }, [ ] ] } } } } ] )
{ _id: 1, count: 3 }
{ _id: 2, count: 2 }
{ _id: 3, count: 0 }
{ _id: 4, count: 0 }
{ _id: 5, count: 0 }
db.arrayTest.aggregate([{$project: { count: { $cond: { if: {$arrayElemAt: [ "$arrayOfArrays", 0 ]}, then: { $size: { $arrayElemAt: [ "$arrayOfArrays", 0 ] } }, else: null} } } } ] )
{ _id: 1, count: 3 }
{ _id: 2, count: 2 }
{ _id: 3, count: 0 }
{ _id: 4, count: 0 }
{ _id: 5, count: null }
I have 2 collections, collection A has some documents like {'id':1,'field':'name'},{'id':1,'field':'age'},and collection B has some documents like
{'_id':1,'name':'alice','age':18,'phone':123},{'_id':2,'name':'bob','age':30,'phone':321}
and I want to find all the document whose '_id' is in collectionA, and just project the corresponding field.
for example:
collection A
{'id':1,'field':'name'},
{'id':1,'field':'age'}
collection B
{'_id':1,'name':'alice','age':18,'phone':123},
{'_id':2,'name':'bob','age':30,'phone':321}
the result is:
{'name':'alice','age':18},
I don't know if there is an easy way to do that?
You can use $lookup to join two collection
db.col1.aggregate([
{
$match: {
id: 1
}
},
{
"$lookup": {
"from": "col2",
"localField": "id",
"foreignField": "_id",
"as": "listNames"
}
},
{
$project: {
listNames: {
$first: "$listNames"
}
}
},
{
$project: {
_id: 0,
name: "$listNames.name",
age: "$listNames.age"
}
}
])
Mongo Playground: https://mongoplayground.net/p/E-0WvK_SUS_
So the idea is:
Convert the documents in to key, value pair for both the collections using $objectToArray.
Then perform a join operation based on key k and (id <-> _id) using $lookup.
Replace the result as root element using $replaceRoot.
Convert array to object using $arrayToObject and again $replaceRoot.
Query:
db.colB.aggregate([
{
$project: {
temp: { $objectToArray: "$$ROOT" }
}
},
{
$lookup: {
from: "colA",
let: { temp: "$temp", colB_id: "$_id" },
pipeline: [
{
$addFields: {
temp: { k: "$field", v: "$id" }
}
},
{
$match: {
$expr: {
$and: [
{ $in: ["$temp.k", "$$temp.k"] },
{ $eq: ["$temp.v", "$$colB_id"] }
]
}
}
},
{
$replaceRoot: {
newRoot: {
$first: {
$filter: {
input: "$$temp",
as: "item",
cond: { $eq: ["$field", "$$item.k"] }
}
}
}
}
}
],
as: "array"
}
},
{
$replaceRoot: {
newRoot: { $arrayToObject: "$array" }
}
}
]);
Output:
{
"name" : "alice",
"age" : 18
}
I am trying to get first date from inner array in mongodb object and add it to it's parent with aggregation. Example:
car: {
"model": "Astra",
"productions": [
"modelOne": {
"dateOfCreation": "2019-09-30T10:15:25.026+00:00",
"dateOfEstimation": "2017-09-30T10:15:25.026+00:00",
"someOnterInfo": "whatever"
},
"modelTwo": {
"dateOfCreation": "2017-09-30T10:15:25.026+00:00",
"dateOfEstimation": "2019-09-30T10:15:25.026+00:00",
"someOnterInfo": "whatever"
}
]
}
to be turned in
car: {
"model": "Astra",
"earliestDateOfEstimation": "2017-09-30T10:15:25.026+00:00",
"earliestDateOfCreation": "2017-09-30T10:15:25.026+00:00"
}
How can I achieve that?
I'm assuming that modelOne and modelTwo are unknown when you start your aggregation. The key step is to run $map along with $objectToArray in order to get rid of those two values. Then you can just use $min to get "earliest" values:
db.collection.aggregate([
{
$addFields: {
dates: {
$map: {
input: "$car.productions",
in: {
$let: {
vars: { model: { $arrayElemAt: [ { $objectToArray: "$$this" }, 0 ] } },
in: "$$model.v"
}
}
}
}
}
},
{
$project: {
_id: 1,
"car.model": 1,
"car.earliestDateOfEstimation": { $min: "$dates.dateOfEstimation" },
"car.earliestDateOfCreation": { $min: "$dates.dateOfCreation" },
}
}
])
Mongo Playground
EDIT:
First step can be simplified if there's always modelOne, 'modelTwo'... (fixed number)
db.collection.aggregate([
{
$addFields: {
dates: { $concatArrays: [ "$car.productions.modelOne", "$car.productions.modelTwo" ] }
}
},
{
$project: {
_id: 1,
"car.model": 1,
"car.earliestDateOfEstimation": { $min: "$dates.dateOfEstimation" },
"car.earliestDateOfCreation": { $min: "$dates.dateOfCreation" },
}
}
])
Mongo Playground (2)
I have changed one of the fields of my collection in mongoDB from an array of strings to an array of object containing 2 strings. New documents get inserted without any problem, but when a get method is called to get , querying all the documents I get this error:
Failed to decode 'Students'. Decoding 'photoAddresses' errored
with: readStartDocument can only be called when CurrentBSONType is
DOCUMENT, not when CurrentBSONType is STRING.
photoAddresses is the field that was changed in Students.
I was wondering is there any way to update all the records so they all have the same data type, without losing any data.
The old version of photoAdresses:
"photoAddresses" : ["something","something else"]
This should be updated to the new version like this:
"photoAddresses" : [{photoAddresses:"something"},{photoAddresses:"something else"}]
The following aggregation queries update the string array to object array, only if the array has string elements. The aggregation operator $map is used to map the string array elements to objects. You can use any of the two queries.
db.test.aggregate( [
{
$match: {
$expr: { $and: [ { $isArray: "$photo" },
{ $gt: [ { $size: "$photo" }, 0 ] }
]
},
"photo.0": { $type: "string" }
}
},
{
$project: {
photo: {
$map: {
input: "$photo",
as: "ph",
in: { addr: "$$ph" }
}
}
}
},
] ).forEach( doc => db.test.updateOne( { _id: doc._id }, { $set: { photo: doc.photo } } ) )
The following query works with MongoDB version 4.2+ only. Note the update operation is an aggregation instead of an update. See updateMany.
db.test.updateMany(
{
$expr: { $and: [ { $isArray: "$photo" },
{ $gt: [ { $size: "$photo" }, 0 ] }
]
},
"photo.0": { $type: "string" }
},
[
{
$set: {
photo: {
$map: {
input: "$photo",
as: "ph",
in: { addr: "$$ph" }
}
}
}
}
]
)
[EDIT ADD]: The following query works with version MongoDB 3.4:
db.test.aggregate( [
{
$addFields: {
matches: {
$cond: {
if: { $and: [
{ $isArray: "$photoAddresses" },
{ $gt: [ { $size: "$photoAddresses" }, 0 ] },
{ $eq: [ { $type: { $arrayElemAt: [ "$photoAddresses", 0 ] } }, "string" ] }
] },
then: true,
else: false
}
}
}
},
{
$match: { matches: true }
},
{
$project: {
photoAddresses: {
$map: {
input: "$photoAddresses",
as: "ph",
in: { photoAddresses: "$$ph" }
}
}
}
},
] ).forEach( doc => db.test.updateOne( { _id: doc._id }, { $set: { photoAddresses: doc.photoAddresses } } ) )