Remove oldest N elements from document array - arrays

I have a document in my mongodb that contains a very large array (about 10k items). I'm trying to only keep the latest 1k in the array (and so remove the first 9k elements). The document looks something like this:
{
"_id" : 'fakeid64',
"Dropper" : [
{
"md5" : "fakemd5-1"
},
{
"md5" : "fakemd5-2"
},
...,
{
"md5": "fakemd5-10000"
}
]
}
How do I accomplish that?

The correct operation to do here actually involves the $push operator using the $each and $slice modifiers. The usage may initially appear counter-intuitive that you would use $push to "remove" items from an array, but the actual use case is clear when you see the intended operation.
db.collection.update(
{ "_id": "fakeid64" },
{ "$push": { "Dropper": { "$each": [], "$slice": -1000 } }
)
You can in fact just run for your whole collection as:
db.collection.update(
{ },
{ "$push": { "Dropper": { "$each": [], "$slice": -1000 } },
{ "multi": true }
)
What happens here is that the modifier for $each takes an array of items to "add" in the $push operation, which in this case we leave empty since we do not actually want to add anything. The $slice modifier given a "negative" value is actually saying to keep the "last n" elements present in the array as the update is performed, which is exactly what you are asking.
The general "intended" case is to use $slice when adding new elements to "maintain" the array at a "maximum" given length, which in this case would be 1000. So you would generally use in tandem with actually "adding" new items like this:
db.collection.update(
{ "_id": "fakeid64" },
{ "$push": { "Dropper": { "$each": [{ "md5": "fakemd5-newEntry"}], "$slice": -1000 } }
)
This would append the new item(s) provided in $each whilst also removing any items from the "start" of the array where the total length given the addition was greater than 1000.
It is stated incorrectly elsewhere that you would use $pullAll with a supplied list of the array content already existing in the document, but the operation is actually two requests to the database.
The misconception being that the request is sent as "one", but it actually is not and is basically interpreted as the longer form ( with correct usage of .slice() ):
var md5s = db.collection.findOne({ "_id": "fakeid64" }).Dropper.slice(-1000);
db.collection.update(
{ "_id": "fakeid64" },
{ "$pullAll": { "Dropper": md5s } }
)
So you can see that this is not very efficient and is in fact quite dangerous when you consider that the state of the array within the document "could" possibly change in between the "read" of the array content and the actual "write" operation on update since they occur separately.
This is why MongoDB has atomic operators for $push with $slice as is demonstrated. Since it is not only more efficient, but also takes into consideration the actual "state" of the document being modified at the time the actual modification occurs.

you can use $pullAll operator
suppose you use python/pymongo driver:
yourcollection.update_one(
{'_id': fakeid64},
{'$pullAll': {'Dropper': yourcollection.find_one({'_id': 'fakeid64'})['Dropper'][:9000]}}
)
or in mongo shell:
db.yourcollection.update(
{ _id: 'fakeid64'},
{$pullAll: {'Dropper': db.yourcollection.findOne({'_id' : 'fakeid64'})['Dropper'].slice(0,9000)}}
)
(*) having saying that it would be much better if you didn't allow your document(s) to grow this much in first place

This is just a representation of query. Basically you can unwind with limit and skip, then use cursor foreach to remove the items like below :
db.your_collection.aggregate([
{ $match : { _id : 'fakeid64' } },
{ $unwind : "$Dropper"},
{ $skip : 1000},
{ $limit : 9000}
]).forEach(function(doc){
db.your_collection.update({ _id : doc._id}, { $pull : { Dropper : doc.Dropper} });
});

from mongo docs
db.students.update(
{ _id: 1 },
{
$push: {
scores: {
$each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ],
$sort: { score: 1 },
$slice: -3
}
}
}
)
The following update uses the $push operator with:
the $each modifier to append to the array 2 new elements,
the $sort modifier to order the elements by ascending (1) score, and
the $slice modifier to keep the last 3 elements of the ordered array.

Related

Is there a way to sort MongoDB records by the highest difference between two values in an object?

I have a database in which records look like this:
{
id: someId
initialValue: 100
currentValue: 150
creationDate: someDate
}
And I have to get values that are the biggest in terms of difference between currentValue and initialValue. Is it possible in MongoDB to write a sorting function that will substract one value from another and then compare (sort) them?
One of the slightly overkill but extensible solution is to use $setWindowFields to $rank the $subtract sum. You can manipulate the rank field by setting other functions if changes needed to be made in future. The out-of-the-box behaviour also provides a tie-breaker functionality.
db.collection.aggregate([
{
"$addFields": {
"diff": {
"$subtract": [
"$currentValue",
"$initialValue"
]
}
}
},
{
"$setWindowFields": {
"partitionBy": null,
"sortBy": {
"diff": -1
},
"output": {
"rank": {
$rank: {}
}
}
}
},
{
$sort: {
rank: 1
}
},
// cosmetics
{
"$unset": [
"diff",
"rank"
]
}
])
Mongo Playground
Sure, simply generate the desired value to sort on in a preceding $addFields stage first:
{
$addFields: {
diff: {
"$subtract": [
"$currentValue",
"$initialValue"
]
}
}
},
{
$sort: {
diff: -1
}
}
Playground example here
Note that this operation cannot use an index so will have to manually sort the data. This may be a heavy and/or slow operation. You may wish to consider persisting the value when you write to the documents and index it. This would slightly increase the write overhead, but would significantly reduce the amount of work and time required to perform the reads.

MongoDB remove an item from an array inside an array of objects

I have a document that looks like this:
{
"_id" : ObjectId("56fea43a571332cc97e06d9c"),
"sections" : [
{
"_id" : ObjectId("56fea43a571332cc97e06d9e"),
"registered" : [
"123",
"e3d65a4e-2552-4995-ac5a-3c5180258d87"
]
}
]
}
I'd like to remove the 'e3d65a4e-2552-4995-ac5a-3c5180258d87' in the registered array of only the specific section with the _id of '56fea43a571332cc97e06d9e'.
My current attempt is something like this, but it just returns the original document unmodified.
db.test.findOneAndUpdate(
{
$and: [
{'sections._id': ObjectId('56fea43a571332cc97e06d9e')},
{'sections.registered': 'e3d65a4e-2552-4995-ac5a-3c5180258d87'}
]
},
{
$pull: {
$and: [
{'sections._id': ObjectId('56fea43a571332cc97e06d9e')},
{'sections.registered': 'e3d65a4e-2552-4995-ac5a-3c5180258d87'}
]
}
})
I've looked in to $pull, but I can't seem to figure out how to make it work on an array of nested objects containing another array. The $pull examples all seem to deal with only one level of nesting. How do I remove the matching entry from the registered array of the item in the sections array with the _id that I supply?
You need to use the positional $ update operator to remove the element from your array. You need this is because "sections" is an array of sub-documents.
db.test.findOneAndUpdate(
{ "sections._id" : ObjectId("56fea43a571332cc97e06d9e") },
{ "$pull": { "sections.$.registered": "e3d65a4e-2552-4995-ac5a-3c5180258d87" } }
)

MongoDB - Query on the last element of an array?

I know that MongoDB supports the syntax find{array.0.field:"value"}, but I specifically want to do this for the last element in the array, which means I don't know the index. Is there some kind of operator for this, or am I out of luck?
EDIT: To clarify, I want find() to only return documents where a field in the last element of an array matches a specific value.
In 3.2 this is possible. First project so that myField contains only the last element, and then match on myField.
db.collection.aggregate([
{ $project: { id: 1, myField: { $slice: [ "$myField", -1 ] } } },
{ $match: { myField: "myValue" } }
]);
You can use $expr ( 3.6 mongo version operator ) to use aggregation functions in regular query.
Compare query operators vs aggregation comparison operators.
For scalar arrays
db.col.find({$expr: {$gt: [{$arrayElemAt: ["$array", -1]}, value]}})
For embedded arrays - Use $arrayElemAt expression with dot notation to project last element.
db.col.find({$expr: {$gt: [{"$arrayElemAt": ["$array.field", -1]}, value]}})
Spring #Query code
#Query("{$expr:{$gt:[{$arrayElemAt:[\"$array\", -1]}, ?0]}}")
ReturnType MethodName(ArgType arg);
Starting Mongo 4.4, the aggregation operator $last can be used to access the last element of an array:
For instance, within a find query:
// { "myArray": ["A", "B", "C"] }
// { "myArray": ["D"] }
db.collection.find({ $expr: { $eq: [{ $last: "$myArray" }, "C"] } })
// { "myArray": ["A", "B", "C"] }
Or within an aggregation query:
db.collection.aggregate([
{ $addFields: { last: { $last: "$myArray" } } },
{ $match: { last: "C" } }
])
use $slice.
db.collection.find( {}, { array_field: { $slice: -1 } } )
Editing:
You can make use of
{ <field>: { $elemMatch: { <query1>, <query2>, ... } } } to find a match.
But it won't give exactly what you are looking for. I don't think that is possible in mongoDB yet.
I posted on the official Mongo Google group here, and got an answer from their staff. It appears that what I'm looking for isn't possible. I'm going to just use a different schema approach.
Version 3.6 use aggregation to achieve the same.
db.getCollection('deviceTrackerHistory').aggregate([
{
$match:{clientId:"12"}
},
{
$project:
{
deviceId:1,
recent: { $arrayElemAt: [ "$history", -1 ] }
}
}
])
You could use $position: 0 whenever you $push, and then always query array.0 to get the most recently added element. Of course then, you wont be able to get the new "last" element.
Not sure about performance, but this works well for me:
db.getCollection('test').find(
{
$where: "this.someArray[this.someArray.length - 1] === 'pattern'"
}
)
You can solve this using aggregation.
model.aggregate([
{
$addFields: {
lastArrayElement: {
$slice: ["$array", -1],
},
},
},
{
$match: {
"lastArrayElement.field": value,
},
},
]);
Quick explanations. aggregate creates a pipeline of actions, executed sequentially, which is why it takes an array as parameter. First we use the $addFields pipeline stage. This is new in version 3.4, and basically means: Keep all the existing fields of the document, but also add the following. In our case we're adding lastArrayElement and defining it as the last element in the array called array. Next we perform a $match pipeline stage. The input to this is the output from the previous stage, which includes our new lastArrayElement field. Here we're saying that we only include documents where its field field has the value value.
Note that the resulting matching documents will include lastArrayElement. If for some reason you really don't want this, you could add a $project pipeline stage after $match to remove it.
For the answer use $arrayElemAt,if i want orderNumber:"12345" and the last element's value $gt than "value"? how to make the $expr? thanks!
For embedded arrays - Use $arrayElemAt expression with dot notation to project last element.
db.col.find({$expr: {$gt: [{"$arrayElemAt": ["$array.field", -1]}, value]}})
db.collection.aggregate([
{
$match: {
$and: [
{ $expr: { $eq: [{ "$arrayElemAt": ["$fieldArray.name", -1] }, "value"] } },
{ $or: [] }
]
}
}
]);

Getting subdocument element's count per index inside an array and updating the subdocument key - subdocument in array(IN MONGODB)

How to get subdocument element's count inside an array and how to update the subdocument's key in MongoDB
For eg, following is the whole doc stored in mongodb:
{
"CompanyCode" : "SNBN",
"EventCode" : "ET00008352",
"EventName" : "Sunburn Presents Avicii India Tour",
"TktDetail" : [
{
"Type" : "Category I",
"Qty" : {
"10-Dec" : {
"value" : 58
},
"11-Dec" : {
"value" : 83
},
"12-Dec" : {
"value" : 100
}
}
},
{
"Type" : "Category II",
"Qty" : {
"10-Dec" : {
"value" : 4
},
"11-Dec" : {
"value" : 7
},
"12-Dec" : {
"value" : 8
}
}
},
{
"Type" : "PRICE LEVEL 1",
"Qty" : {
"11-Dec" : {
"value" : 2
}
}
},
{
"Type" : "CatIV",
"Qty" : {
"18-Dec" : {
"value" : 20
}
}
}
],
"TransDate" : [
"10-Dec-2013",
"11-Dec-2013",
"12-Dec-2013",
],
"VenueCode" : "SNBN",
"VenueName" : "Sunburn",
"_id" : ObjectId("52452db273b92012c41ad612")
}
Here TktDetail is an array, inside which there is a Qty subdoc which contains multiple elements, I want to know how to get the elements count inside Qty per index?
For example, the 0th index of TktDetail array contains 1 Qty subdoc, which further has a element count of 3, whereas 3rd index has element count of 1 in Qty subdoc.
If I want to update the subdoc key, like, I want to update the date in Qty from "10-Dec" to "10-Dec-2013", how is it possible?
Thanks in advance, looking for a reply ASAP..
So the first thing here is that you actually asked two questions, being "how do I get a count of the items under Qty?" and "how can I change the names?". Now while normally unrelated I'm going to treat them as the same thing.
What you need to do is change your schema and in doing so I'm going to allow you to get the count of items and I'm going to encourage you to change those field names as well. Specifically you need a schema like this:
"TktDetail" : [
{
"Type" : "Category I",
"Qty" : [
{ "date": ISODate("2013-12-10T00:00:00.000Z") , "value" : 58 },
{ "date": ISODate("2013-12-11T00:00:00.000Z"), "value" : 83 },
{ "date": ISODate("2013-12-01T00:00:00.000Z"), "value" : 100 },
]
},
All the gory details are in my answer here to a similar question. But the problem basically is that when you use sub-documents in the way you have you are ruining your chances of doing any meaningful query operations on this, as to get at each element you must specify the full path to get there.
That answer has more detail, but the case is you really want an array. The trade-off, it's a little harder to update, especially considering you have nested arrays, but it's a lot easier to add and much easier to query.
Also, and related, change your dates to be dates and not strings. The strings, are no good for comparisons inside MongoDB. With them set as proper BSON dates (noting I clipped them to the start of day) you can compare, and query ranges and do useful things. Your application code will be happy to as the driver will return a real date object, rather than something you have to manipulate "both ways".
So once you have read through, understood and implemented this, on to counting:
db.collection.aggregate([
// Unwind the TktDetail array to de-normalize
{"$unwind": "$TktDetail"},
// Also Unwind the Qty array
{"$unwind": "$Qty" },
// Get some group information and count the entries
{"$group": {
"_id": {
"_id": "$_id,
"EventCode": "$EventCode",
"Type": "$TktDetail.Type"
},
"Qty": {"$sum": 1 }
}},
// Project nicely
{"$project": {
"_id": 0,
"EventCode": "$_id.EventCode",
"Type: "$_id.Type",
"Qty": 1,
}},
// Let's even sort it
{"$sort": { "EventCode": 1, "Qty" -1 }}
])
So that allowed us to get a count of the items in Qty for each EventCode by Type with the Qty ordered higest to lowest.
And that is not possible on your current schema without loading and traversing each document in code.
So there is the case. Now if you want to ignore this and just go about changing the sub-document key names, then you'll need to do remove the key and underlying document and replace with the new key name, using update:
db.collection.update(
{ EventCode: "ET00008352"},
{ $unset:{ "TktDetail.0.Qty.10-Dec": "" }}
)
db.collection.update(
{ EventCode: "ET00008352"},
{ $set:{ "TktDetail.0.Qty.10-Dec-2013": { value: 58 } }}
)
And you'll need to do that for every item that you have.
So you either work out that schema conversion or otherwise have a lot of work anyway in order to change the keys. For myself, I'd do it properly, and only do it once so I didn't run into the next problem later.

MongoDB rename database field within array

I need to rename indentifier in this:
{ "general" :
{ "files" :
{ "file" :
[
{ "version" :
{ "software_program" : "MonkeyPlus",
"indentifier" : "6.0.0"
}
}
]
}
}
}
I've tried
db.nrel.component.update(
{},
{ $rename: {
"general.files.file.$.version.indentifier" : "general.files.file.$.version.identifier"
} },
false, true
)
but it returns: $rename source may not be dynamic array.
For what it's worth, while it sounds awful to have to do, the solution is actually pretty easy. This of course depends on how many records you have. But here's my example:
db.Setting.find({ 'Value.Tiers.0.AssetsUnderManagement': { $exists: 1 } }).snapshot().forEach(function(item)
{
for(i = 0; i != item.Value.Tiers.length; ++i)
{
item.Value.Tiers[i].Aum = item.Value.Tiers[i].AssetsUnderManagement;
delete item.Value.Tiers[i].AssetsUnderManagement;
}
db.Setting.update({_id: item._id}, item);
});
I iterate over my collection where the array is found and the "wrong" name is found. I then iterate over the sub collection, set the new value, delete the old, and update the whole document. It was relatively painless. Granted I only have a few tens of thousands of rows to search through, of which only a few dozen meet the criteria.
Still, I hope this answer helps someone!
Edit: Added snapshot() to the query. See why in the comments.
You must apply snapshot() to the cursor before retrieving any documents from the database.
You can only use snapshot() with unsharded collections.
From MongoDB 3.4, snapshot() function was removed. So if using Mongo 3.4+ ,the example above should remove snapshot() function.
As mentioned in the documentation there is no way to directly rename fields within arrays with a single command. Your only option is to iterate over your collection documents, read them and update each with $unset old/$set new operations.
I had a similar problem. In my situation I found the following was much easier:
I exported the collection to json:
mongoexport --db mydb --collection modules --out modules.json
I did a find and replace on the json using my favoured text editing utility.
I reimported the edited file, dropping the old collection along the way:
mongoimport --db mydb --collection modules --drop --file modules.json
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update of a field based on its own value:
// { general: { files: { file: [
// { version: { software_program: "MonkeyPlus", indentifier: "6.0.0" } }
// ] } } }
db.collection.updateMany(
{},
[{ $set: { "general.files.file": {
$map: {
input: "$general.files.file",
as: "file",
in: {
version: {
software_program: "$$file.version.software_program",
identifier: "$$file.version.indentifier" // fixing the typo here
}
}
}
}}}]
)
// { general: { files: { file: [
// { version: { software_program: "MonkeyPlus", identifier: "6.0.0" } }
// ] } } }
Literally, this updates documents by (re)$setting the "general.files.file" array by $mapping its "file" elements in a "version" object containing the same "software_program" field and the renamed "identifier" field which contains what used to be the value of "indentifier".
A couple additional details:
The first part {} is the match query, filtering which documents to update (in this case all documents).
The second part [{ $set: { "general.files.file": { ... }}}] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator which in this case replaces the value of the "general.files.file" array.
Using a $map operation, we replace all elements from the "general.files.file" array by basically the same elements, but with an "identifier" field rather than "indentifier":
input is the array to map.
as is the variable name given to looped elements
in is the actual transformation applied on elements. In this case, it replaces elements by a "version" object composed by a "software_program" and a "identifier" fields. These fields are populated by extracting their previous values using the $$file.xxxx notation (where file is the name given to elements from the as part).
I had to face the issue with the same schema. So this query will helpful for someone who wants to rename the field in an embedded array.
db.getCollection("sampledocument").updateMany({}, [
{
$set: {
"general.files.file": {
$map: {
input: "$general.files.file",
in: {
version: {
$mergeObjects: [
"$$this.version",
{ identifer: "$$this.version.indentifier" },
],
},
},
},
},
},
},
{ $unset: "general.files.file.version.indentifier" },
]);
Another Solution
I also would like rename a property in array: and I used thaht
db.getCollection('YourCollectionName').find({}).snapshot().forEach(function(a){
a.Array1.forEach(function(b){
b.Array2.forEach(function(c){
c.NewPropertyName = c.OldPropertyName;
delete c["OldPropertyName"];
});
});
db.getCollection('YourCollectionName').save(a)
});
The easiest and shortest solution using aggregate (Mongo 4.0+).
db.myCollection.aggregate([
{
$addFields: {
"myArray.newField": {$arrayElemAt: ["$myArray.oldField", 0] }
}
},
{$project: { "myArray.oldField": false}},
{$out: {db: "myDb", coll: "myCollection"}}
])
The problem using forEach loop as mention above is the very bad performance when the collection is huge.
My proposal would be this one:
db.nrel.component.aggregate([
{ $unwind: "$general.files.file" },
{
$set: {
"general.files.file.version.identifier": {
$ifNull: ["$general.files.file.version.indentifier", "$general.files.file.version.identifier"]
}
}
},
{ $unset: "general.files.file.version.indentifier" },
{ $set: { "general.files.file": ["$general.files.file"] } },
{ $out: "nrel.component" } // carefully - it replaces entire collection.
])
However, this works only when array general.files.file has a single document only. Most likely this will not always be the case, then you can use this one:
db.nrel.componen.aggregate([
{ $unwind: "$general.files.file" },
{
$set: {
"general.files.file.version.identifier": {
$ifNull: ["$general.files.file.version.indentifier", "$general.files.file.version.identifier"]
}
}
},
{ $unset: "general.files.file.version.indentifier" },
{ $group: { _id: "$_id", general_new: { $addToSet: "$general.files.file" } } },
{ $set: { "general.files.file": "$general_new" } },
{ $unset: "general_new" },
{ $out: "nrel.component" } // carefully - it replaces entire collection.
])

Resources