How to add array-index field to items in mongodb nested array - arrays

Inspired by another question I was looking for a common way to add a field with the index to each item in a nested array.
Assuming my document looks like:
{
_id: ObjectId("5a934e000102030405000000"),
events: [
{
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
},
{
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
}
And I want each item to contain a new field which is the index of the item in the array:
{
_id: ObjectId("5a934e000102030405000000"),
events: [
{
arrayIndex: 0,
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
arrayIndex: 1,
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
},
{
arrayIndex: 2,
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
arrayIndex: 3,
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
}

Since mongoDB version 3.4, this can be done using an aggregation pipeline with a $reduce phase, which uses the size of the new accumulated array:
db.collection.aggregate([
{$project: {
events: {
$reduce: {
input: "$events",
initialValue: [],
in: {
$concatArrays: [
"$$value",
[
{$mergeObjects: [
"$$this",
{arrayIndex: {$size: "$$value"}}
]}
]
]
}
}
}
}}
])
See how it works on the playground example

Related

Mongoose | Find objects inside of an array, that each object has another array of objects to satisfy condition

I have a collection Shops. Each object in Shops collection has an array of Item objects called items.
{
_id: ObjectId(...),
shopName: 'Ice cream Shop',
items: [
<Item>{
itemName: 'Chocolate IC',
availabilities: [
{
city: 'NY',
arrivals: [
{
price: 3.99,
quantityLeft: 0,
date: 'yesterday'
},
{
price: 3.99,
quantityLeft: 40,
date: 'today'
}
]
},
{
city: 'LA',
arrivals: []
}
]
},
<Item>{
itemName: 'Strawberry IC',
availabilities: [
{
city: 'NY',
arrivals: [
{
price: 3.99,
quantityLeft: 0,
date: 'yesterday'
},
]
}
]
},
],
},
... anotherShops
I want to get list of Item objects which has overall quantityLeft more than 0 from a specific shop.
I tried this code to get all items with the name start with "Straw" from a Shop with shopName equal to 'Ice cream Shop':
const items = await Shop.aggregate()
.match({
shopName: 'Ice cream Shop',
})
.project({
items: {
$filter: {
input: "$items",
as: "item",
cond: {
$regexMatch: {
input: "$$item.itemName",
regex: `.*Straw.*`,
},
},
},
},
});
And it works. But I don't know how to sum up all quantityLeft values inside availabilities array of each item, and return only that items that has sum more than 0.
availabilities array can be an empty array [].
The city parameter also needs to be in condition. For example, only Items that are in stock in NY
I need this to get the list of items from a certain shop, and only the items that are still in stock.
Pretty hard.
I came up with this solution. If you have a better solution, please post it.
const shop = await GCShop.aggregate([
{
$match: {
shopName: 'Ice Cream Shop',
},
},
{
$unwind: "$items",
},
{
$unwind: "$items.availabilities",
},
{
$unwind: "$items.availabilities.arrivals",
},
{
$group: {
_id: "$items.id",
items_name: { $first: "$items.name" },
arrivals: {
$push: {
arrival_id: "$items.availabilities.arrivals.arrival_id",
price: "$items.availabilities.arrivals.price",
qtty: "$items.availabilities.arrivals.qtty",
},
},
totalQtty: { $sum: "$items.availabilities.arrivals.qtty" },
},
},
{
$project: {
offer_id: "$_id",
_id: 0,
offer_name: 1,
totalQtty: 1,
arrivals: 1,
},
},
{
$match: {
totalQtty: {
$gt: 0,
},
},
},
]).limit(20);

mongodb update nested array from a array

I have a array in mongodb document.
{
_id: 1,
jobs:[
{
_id:1,
time: "08:00",
status: "pending",
user: 'user1'
},
{
_id:2,
time: "09:00",
status: "pending",
user: 'user1'
},
{
_id:3,
time: "07:30",
status: "done",
user: 'user2'
}
]
}
now I have a updated jobs array like this.
jobs:[
{
_id:1,
time: "10:00",
status: "done"
},
{
_id:2,
time: "11:00",
status: "done"
}
]
updated document should like this
{
_id: 1,
jobs:[
{
_id:1,
time: "10:00", // updated
status: "done" // updated
user: 'user1'
},
{
_id:2,
time: "11:00", // updated
status: "done", // updated
user: "user1"
},
{
_id:3,
time: "07:30",
status: "done",
user: 'user2'
}
]
}
I tried using update and $set and no luck so far
how do I update the only the values in the updated array in to the mongodb document? thanks in advance
One option is using an update with a pipeline:
Add the new data into the document as newData
Using a $map to loop over the jobs items, for each item merge it with the matching item in newData.
EDIT (consider partial match):
db.collection.update(
{_id: 1},
[{$addFields: {
newData: [
{_id: 1, time: "10:00", status: "done"},
{_id: 2, time: "11:00", status: "done"}
]
}
},
{$project: {
jobs: {$map: {
input: "$jobs",
in: {$mergeObjects: [
"$$this",
{$cond: [
{$gte: [{$indexOfArray: ["$newData._id", "$$this._id"]}, 0]},
{$arrayElemAt: ["$newData", {$indexOfArray: ["$newData._id", "$$this._id"]}]},
]}
]}
]}
}}
}}
])
See how it works on the playground example

How to couple items on a nested-array in mongoDB?

Inspired by another question I was looking for a common way to couple items in a nested array, so the 1st item will be coupled with the 2nd item, and the 3rd item will be coupled with the 4th item.
Assuming my document looks like:
{
_id: ObjectId("5a934e000102030405000000"),
events: [
{
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
},
{
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
}
And I want to couple the items:
{
_id: ObjectId("5a934e000102030405000000"),
couples: [
[
{
mod: 0,
status: 0,
timestamp: ISODate("2022-05-29T13:26:00Z")
},
{
mod: 1,
status: 8,
timestamp: ISODate("2022-05-29T14:41:00Z")
}
],
[
{
mod: 0,
status: 4,
timestamp: ISODate("2022-05-31T10:13:00Z")
},
{
mod: 1,
status: 3,
timestamp: ISODate("2022-05-31T10:18:00Z")
}
]
]
}
Since mongoDB version 4.4*, One option is to use an aggregation pipeline with $reduce, $mod, $filter and $zip:
$reduce with $mod to add a new mod field to each item, with value 0 to each odd index (1, 3, 5,...) and value 1 to each even index (2, 4, 6,...)
$fiter into two arrays according to the mod value
$zip these two arrays to one array of couples
db.collection.aggregate([
{
$project: {
events: {
$reduce: {
input: "$events",
initialValue: [],
in: {$concatArrays: [
"$$value",
[
{
timestamp: "$$this.timestamp",
status: "$$this.status",
mod: {$mod: [{$size: "$$value"}, 2]}
}
]
]
}
}
}
}
},
{
$project: {
firstEvent: {$filter: {input: "$events", cond: {$eq: ["$$this.mod", 0]}}},
secondEvent: {$filter: {input: "$events", cond: {$eq: ["$$this.mod", 1]}}}
}
},
{$project: {couples: {$zip: {inputs: ["$firstEvent", "$secondEvent"]}}}}
])
See how it works on the playground example
*With older mongoDB versions, 3.4 or higher, the $mod can be replaces with a "manual" mod calculation.

MongoDB - Geometry object to return the total pairs/points in linear ring

I am trying to write a performance report based on shapefile data I have stored within docs stored in collections.
Here is a sample of data:
The following function works quite well as it returns the amount of bytes for each document - great, however I would also like to know how many points/pairs are stored within each polygon's linear string for each document.
db.getCollection("_collectionName").aggregate([{"$project": {"object_size": { $bsonSize: "$$ROOT" }}}])
This returns the following set of data (sample):
{ _id: ObjectId("5ef7da26ae8659149c97657e"), rootSize: 42215 },
{ _id: ObjectId("5ef7da45ae8659149c97657f"), rootSize: 118574 },
{ _id: ObjectId("5ef7daf1ae8659149c976585"), rootSize: 11886 },
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), rootSize: 43136 },
{ _id: ObjectId("5ef7daa6ae8659149c976582"), rootSize: 40823 },
{ _id: ObjectId("5f3495129861ce45eb4e9728"), rootSize: 394884 },
{ _id: ObjectId("5ef7d7f6ae8659149c97657c"), rootSize: 125309 },
{ _id: ObjectId("5ef7dad6ae8659149c976584"), rootSize: 127447 },
{ _id: ObjectId("5fa56ef26538cd3bddd8389e"), rootSize: 17670 },
{ _id: ObjectId("5fa56ef26538cd3bddd8389f"), rootSize: 11398 },
{ _id: ObjectId("5fa56ef16538cd3bddd8389c"), rootSize: 2415 },
{ _id: ObjectId("5fa56ef36538cd3bddd838ae"), rootSize: 1757 },
{ _id: ObjectId("5fa56ef36538cd3bddd838b0"), rootSize: 4866 },
{ _id: ObjectId("5fa56ef36538cd3bddd838a8"), rootSize: 1510 },
{ _id: ObjectId("5fa56ef26538cd3bddd838a7"), rootSize: 39631 },
{ _id: ObjectId("5fa56ef36538cd3bddd838ab"), rootSize: 3662 },
{ _id: ObjectId("5fa56ef36538cd3bddd838aa"), rootSize: 15844 },
{ _id: ObjectId("5fa56ef16538cd3bddd8389d"), rootSize: 17196 },
{ _id: ObjectId("5fa56ef26538cd3bddd838a3"), rootSize: 34940 },
{ _id: ObjectId("5fa56ef36538cd3bddd838af"), rootSize: 468367 }
Which is great but it does not tell me how many elements are in the array/linear string within geometry.coordinates.
I have tried the following, but no cigar:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry", [] ] } } } }])
MongoServerError: The argument to $size must be an array, but was of type: object
It comes back with an error, which i understand - so i referenced the coordinates array:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry.coordinates", [] ] } } } }])
Which, returned the following data, again correct, if you understand GeoJSON files this is normal as this is the top level of the linear ring, sample data:
{ _id: ObjectId("5ef7da26ae8659149c97657e"), count: 1 }
{ _id: ObjectId("5ef7da45ae8659149c97657f"), count: 1 }
{ _id: ObjectId("5ef7daf1ae8659149c976585"), count: 1 }
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), count: 1 }
So I then added the top level array of 0 to my aggregate function:
db.getCollection("_collectionName").aggregate([{$project: { count: { $size: { "$ifNull": [ "$geometry.coordinates.0", [] ] } } } }])
And this is what was returned:
{ _id: ObjectId("5ef7da26ae8659149c97657e"), count: 0 }
{ _id: ObjectId("5ef7da45ae8659149c97657f"), count: 0 }
{ _id: ObjectId("5ef7daf1ae8659149c976585"), count: 0 }
{ _id: ObjectId("5f216685dbef0f7c3339ec03"), count: 0 }
And that is not possible, here is a screenshot from Studio3T software:
Anybody who might be able to help or point me in the right direction please do so....
(I would be very grateful!)
The dot notation won't work on array elements within an aggregation. You'll want to use the $arrayElemAt operator, as follows:
db.getCollection("_aGStbl").aggregate([{
$project: {
count: { $size: { $arrayElemAt: [ "$geometry.coordinates", 0 ]}}
}
}])
To cater for Null values, you can use a $cond, depending on your objective for the output:
INSERT SOME DATA INTO A TESTDB:
db.arrayTest.insertMany([
{ _id: 1, arrayOfArrays: [ [ 1, 2, 3 ], [ 1, 2, 3, 4 ], [ 1, 2, 3, 4, 5, 6, 7 ] ] },
{ _id: 2, arrayOfArrays: [ [ 4, 5 ], [ 5, 6, 7 ] ] },
{ _id: 3, arrayOfArrays: [ [], [] ] },
{ _id: 4, arrayOfArrays: [ [], [], [] ] },
{ _id: 5 }
] )
{ acknowledged: true,
insertedIds: { '0': 1, '1': 2, '2': 3, '3': 4, '4': 5 } }
TRY THESE AGGREGATE CALLS:
db.arrayTest.aggregate([{$project: { count: { $size: { "$ifNull": [ { $arrayElemAt: [ "$arrayOfArrays", 0 ] }, [ ] ] } } } } ] )
{ _id: 1, count: 3 }
{ _id: 2, count: 2 }
{ _id: 3, count: 0 }
{ _id: 4, count: 0 }
{ _id: 5, count: 0 }
db.arrayTest.aggregate([{$project: { count: { $cond: { if: {$arrayElemAt: [ "$arrayOfArrays", 0 ]}, then: { $size: { $arrayElemAt: [ "$arrayOfArrays", 0 ] } }, else: null} } } } ] )
{ _id: 1, count: 3 }
{ _id: 2, count: 2 }
{ _id: 3, count: 0 }
{ _id: 4, count: 0 }
{ _id: 5, count: null }

Find duplicate urls in mongodb [duplicate]

This question already has answers here:
Find duplicate records in MongoDB
(10 answers)
Closed 2 years ago.
I have a DB with news articles, and I am trying to do a little DB cleaning. I want to find all duplicate documents, and the best way i think to accomplish this by using the url field. My documents are structured as follows:
{
_id:
author:
title:
description:
url:
urlToImage:
publishedAt:
content:
summarization:
source_id:
}
Any help is greatly appreciated
Assuming a collection documents with name (using name instead of url) field consisting duplicate values. I have two aggregations which return some output which can be used to do further processing. I hope you will find this useful.
{ _id: 1, name: "jack" },
{ _id: 2, name: "john" },
{ _id: 3, name: "jim" },
{ _id: 4, name: "john" }
{ _id: 5, name: "john" },
{ _id: 6, name: "jim" }
Note that "john" has 3 occurrances and "jim" has 2.
(1) This aggregation returns the names which have duplicates (more than one occurance):
db.collection.aggregate( [
{
$group: {
_id: "$name",
count: { $sum: 1 }
}
},
{
$group: {
_id: "duplicate_names",
names: { $push: { $cond: [ { $gt: [ "$count", 1 ] }, "$_id", "$DUMMY" ] } }
}
}
] )
The output:
{ "_id" : "duplicate_names", "names" : [ "john", "jim" ] }
(2) The following aggregation just returns the _id field values for the duplicate documents. For example, the name "jim" has _idvalues 3 and 6. The output has only the id's for the duplicate documents, i.e., 6.
db.colection.aggregate( [
{
$group: {
_id: "$name",
count: { $sum: 1 },
ids: { $push: "$_id" }
}
},
{
$group: {
_id: "duplicate_ids",
ids: { $push: { $slice: [ "$ids", 1, 9999 ] } }
}
},
{
$project: {
ids: {
$reduce: {
input: "$ids",
initialValue: [ ],
in: { $concatArrays: [ "$$this", "$$value" ] }
}
}
}
}
] )
The output:
{ "_id" : duplicate_ids", "ids" : [ 6, 4, 5 ] }

Resources