MongoDB: $pull / $unset with multiple conditions - arrays

Example Document:
{
_id: 42,
foo: {
bar: [1, 2, 3, 3, 4, 5, 5]
}
}
The query:
I'd like to "remove all entries from foo.bar that are $lt: 4 and the first matching entry that matches $eq: 5". Important: The $eq part must only remove a single entry!
I have a working solution, that uses 3 update queries, but that's too much for that simple task. Nevertheless, here's what I did so far:
1. Find the first entry matching $eq: 5 and $unset it. (As you know: $unset doesn't remove it. It just sets it to null):
update(
{ 'foo.bar': 5 },
{ $unset: { 'foo.bar.$': 1 } }
)
2. $pull all entries $eq: null, so that former 5 is really gone:
update(
{},
{ $pull: { 'foo.bar': null } }
)
3. $pull all entries $lt: 4:
update(
{},
{ $pull: { 'foo.bar': { $lt: 4 } } }
)
Resulting Document:
{
_id: 42,
foo: {
bar: [4, 5]
}
}
Ideas and Thoughts:
Extend query 1., so that it will $unset the entries $lt: 4 and one entry $eq: 5. Afterwards we can execute query 2. and there's no need for query 3..
Extend query 2. to $pull everything that matches $or: [{$lt: 4}, {$eq: 5}]. Then there's no need for query 3..
Extend query 2. to $pull everything that is $not: { $gte: 4 }. This expression should match $lt: 4 and $eq: null.
I already tried to implement those queries, but sometimes it complained about the query syntax and sometimes the query did execute and just removed nothing.
Would be nice, if someone has a working solution for this.

Not sure if I get your full meaning of this, but to "bulk" update documents you can always take this approach in addition the oringal $pull and adding some "detection" of which documents you need to remove "duplicate" 5 from:
// Remove less than four first
db.collection.update({},{ "$pull": { "foo.bar": { "$lt": 4 } } },{ "multi": true });
// Initialize Bulk
var bulk = db.collection.initializeOrderdBulkOp(),
count = 0;
// Detect and cycle documents with duplicate five to be removed
db.collection.aggregate([
// Project a "reduced" array and calculate if the same size as orig
{ "$project": {
"foo.bar": { "$setUnion": [ "$foo.bar", [] ] },
"same": { "$eq": [
{ "$size": { "$setUnion": [ "$foo.bar", [] ] } },
{ "$size": "$foo.bar" }
] }
}},
// Filter the results that were unchanged
{ "$match": { "same": true } }
]).forEach(function(doc) {
bulk.find({ "_id": doc._id })
.updateOne({ "$set": { "foo.bar": doc.foo.bar.sort() } });
count++;
// Execute per 1000 processed and re-init
if ( count % 1000 == 0 ) {
bulk.execute();
bulk = db.collection.initializeOrderdBulkOp();
}
});
// Clean up any batched
if ( count % 1000 != 0 )
bulk.execute();
That trims out anything less than "4" and all duplicates where a "duplicate" is detected from the difference in "set" length.
If you just want values of 5 removed as duplicates you can take a similar logic approach to the detection and modification, just not with "set operators" that remove anything that is a "duplicate" making it a valid "set".
At any rate, some detection strategy is going to be better than iterating updates until "all but one" value is gone.
Of course you can simplify your statements a little and remove one update operation, it's not pretty because $pull does not allow an $or condition in a query, but I hope you get the idea if this applies:
db.collection.update(
{ "foo.bar": 5 },
{ "$unset": { "foo.bar.$": 1 } },
{ "multi": true }
); // same approach
// So include all the values "less than four"
db.collection.update(
{ "foo.bar": { "$in": [1,2,3,null] } },
{ "$pull": { "foo.bar": { "$in": [1,2,3,null] } }},
{ "multi": true }
);
It's a bit less processing but of course those need to be exact integer values. Otherwise stick with the three updates you are doing. Better than cycling in code.
For reference, the "nicer" syntax that will unfortunately not work would be something like this:
db.collection.update(
{
"$or": [
{ "foo.bar": { "$lt": 4 } },
{ "foo.bar": null }
]
},
{
"$pull": {
"$or": [
{ "foo.bar": { "$lt": 4 } },
{ "foo.bar": null }
]
}
},
{ "multi": true }
);
Probably worth a JIRA issue, but I suspect mostly because the array element is not the "first" argument directly following $pull.

You can use the Array.prototype.filter() and the Array.prototype.splice() methods
The filter() method creates a news array with foo.bar values $lt: 4 then you use the splice method to remove those values and the first value equal 5 from foo.bar
var idx = [];
db.collection.find().forEach(function(doc){
idx = doc.foo.bar.filter(function(el){
return el < 4;
});
for(var i in idx){
doc.foo.bar.splice(doc.foo.bar.indexOf(idx[i]), 1);
}
doc.foo.bar.splice(doc.foo.bar.indexOf(5), 1);
db.collection.save(doc);
} )

Related

Removing an element in a mongoDB array based on the position of element with dynamically defined array

My question is a combination of:
This question: Removing the array element in mongoDB based on the position of element
And this question: mongodb set object where key value dynamically changes
I know you can define the field (array) dynamically like this:
{ [arrayName]: { <condition> } }
But, I then want to remove a certain element in a dynamically defined array by specifying the position (which is also defined dynamically). In other words, the function that processes this query is coming in which two parameters: the array's name and the index of the element to remove.
The options given by the selected answer were the following:
Option 1, does not work (in general), adapted to my case this looks like:
{ $pull : { [arrayName] : { $gt: index-1, $lt: index+1 } } }
Option 2, I cannot use dynamically defined values in field selectors with quotation marks (as far as I am aware):
{ $pull : "[arrayName].[index]" }
or
{ $pull : "[arrayName].$": index }
Option 3, is different method but can't use it for the same reason:
{ $unset: { "[arrayName].[index]": 1 } } // Won't work
{ $pull: { [arrayName]: null } } // Would probably work
The only workarounds I can think of right now involve significantly changing the design which would be a shame. Any help is appreciated!
PS: I'm using mongoose as a driver on the latest version as of today (v6.3.5) and MongoDB version 5.0.8
On Mongo version 4.2+ You can use pipelined updates to achieve this, you can get it done in multiple ways, here is what I consider the easiest two ways:
using $slice and $concatArrays to remove a certain element:
db.collection.update({},
[
{
$set: {
[arrayName]: {
$concatArrays: [
{
$slice: [
`$${arrayName}`,
index,
]
},
{
$slice: [
`$${arrayName}`,
index + 1,
{
$size: `$${arrayName}`
}
]
}
]
}
}
}
])
Mongo Playground
using $filter and $zip to filter out based on index:
db.collection.updateOne(
{},
[
{
"$set": {
[arrayName]: {
$map: {
input: {
$filter: {
input: {
$zip: {
inputs: [
{
$range: [
0,
{
$size: `$${arrayName}`
}
]
},
`$${arrayName}`
]
}
},
cond: {
$ne: [
{
"$arrayElemAt": [
"$$this",
0
]
},
index
]
}
}
},
in: {
$arrayElemAt: [
"$$this",
1
]
}
}
}
}
}
])
Alternatively you can just prepare

Mongo updateMany statement with an inner array of objects to manipulate

I'm struggling to write a Mongo UpdateMany statement that can reference and update an object within an array.
Here I create 3 documents. Each document has an array called innerArray always containing a single object, with a single date field.
use test;
db.innerArrayExample.insertOne({ _id: 1, "innerArray": [ { "originalDateTime" : ISODate("2022-01-01T01:01:01Z") } ]});
db.innerArrayExample.insertOne({ _id: 2, "innerArray": [ { "originalDateTime" : ISODate("2022-01-02T01:01:01Z") } ]});
db.innerArrayExample.insertOne({ _id: 3, "innerArray": [ { "originalDateTime" : ISODate("2022-01-03T01:01:01Z") } ]});
I want to add a new date field, based on the original date field, to end up with this:
{ _id: 1, "innerArray": [ { "originalDateTime" : ISODate("2022-01-01T01:01:01Z"), "copiedDateTime" : ISODate("2022-01-01T12:01:01Z") } ]}
{ _id: 2, "innerArray": [ { "originalDateTime" : ISODate("2022-01-02T01:01:01Z"), "copiedDateTime" : ISODate("2022-01-02T12:01:01Z") } ]}
{ _id: 3, "innerArray": [ { "originalDateTime" : ISODate("2022-01-03T01:01:01Z"), "copiedDateTime" : ISODate("2022-01-03T12:01:01Z") } ]}
In pseudo code I am saying take the originalDateTime, run it through a function and add a related copiedDateTime value.
For my specific use-case, the function I want to run strips the timezone from originalDateTime, then overwrites it with a new one, equivalent to the Java ZonedDateTime function withZoneSameLocal. Aka 9pm UTC becomes 9pm Brussels (therefore effectively 7pm UTC). The technical justification and methodology were answered in another Stack Overflow question here.
The part of the query I'm struggling with, is the part that updates/selects data from an element inside an array. In my simplistic example, for example I have crafted this query, but unfortunately it doesn't work:
This function puts copiedDateTime in the correct place... but doesn't evaluate the commands to manipulate the date:
db.innerArrayExample.updateMany({ "innerArray.0.originalDateTime" : { $exists : true }}, { $set: { "innerArray.0.copiedDateTime" : { $dateFromString: { dateString: { $dateToString: { "date" : "$innerArray.0.originalDateTime", format: "%Y-%m-%dT%H:%M:%S.%L" }}, format: "%Y-%m-%dT%H:%M:%S.%L", timezone: "Europe/Paris" }}});
// output
{
_id: 1,
innerArray: [
{
originalDateTime: ISODate("2022-01-01T01:01:01.000Z"),
copiedDateTime: {
'$dateFromString': {
dateString: { '$dateToString': [Object] },
format: '%Y-%m-%dT%H:%M:%S.%L',
timezone: 'Europe/Paris'
}
}
}
]
}
This simplified query, also has the same issue:
b.innerArrayExample.updateMany({ "innerArray.0.originalDateTime" : { $exists : true }}, { $set: { "innerArray.0.copiedDateTime" : "$innerArray.0.originalDateTime" }});
//output
{
_id: 1,
innerArray: [
{
originalDateTime: ISODate("2022-01-01T01:01:01.000Z"),
copiedDateTime: '$innerArray.0.originalDateTime'
}
]
}
As you can see this issue looks to be separate from the other stack overflow question. Instead of being able changing timezones, it's about getting things inside arrays to update.
I plan to take this query, create 70,000 variations of it with different location/timezone combinations and run it against a database with millions of records, so I would prefer something that uses updateMany instead of using Javascript to iterate over each row in the database... unless that's the only viable solution.
I have tried putting $set in square brackets. This changes the way it interprets everything, evaluating the right side, but causing other problems:
test> db.innerArrayExample.updateMany({ "_id" : 1 }, [{ $set: { "innerArray.0.copiedDateTime" : "$innerArray.0.originalDateTime" }}]);
//output
{
_id: 1,
innerArray: [
{
'0': { copiedDateTime: [] },
originalDateTime: ISODate("2022-01-01T01:01:01.000Z")
}
]
}
Above it seems to interpret .0. as a literal rather than an array element. (For my needs I know the array only has 1 item at all times). I'm at a loss finding an example that meets my needs.
I have also tried experimenting with the arrayFilters, documented on my mongo updateMany documentation but I cannot fathom how it works with objects:
test> db.innerArrayExample.updateMany(
... { },
... { $set: { "innerArray.$[element].copiedDateTime" : "$innerArray.$[element].originalDateTime" } },
... { arrayFilters: [ { "originalDateTime": { $exists: true } } ] }
... );
MongoServerError: No array filter found for identifier 'element' in path 'innerArray.$[element].copiedDateTime'
test> db.innerArrayExample.updateMany(
... { },
... { $set: { "innerArray.$[0].copiedDateTime" : "$innerArray.$[element].originalDateTime" } },
... { arrayFilters: [ { "0.originalDateTime": { $exists: true } } ] }
... );
MongoServerError: Error parsing array filter :: caused by :: The top-level field name must be an alphanumeric string beginning with a lowercase letter, found '0'
If someone can help me understand the subtleties of the Mongo syntax and help me back on to the right path I'd be very grateful.
You want to be using pipelined updates, the issue you're having with the syntax you're using is that it does not allow the usage of aggregation operators and document field values.
Here is a quick example on how to do it:
db.collection.updateMany({},
[
{
"$set": {
"innerArray": {
$map: {
input: "$innerArray",
in: {
$mergeObjects: [
"$$this",
{
copiedDateTime: "$$this.originalDateTime"
}
]
}
}
}
}
}
])
Mongo Playground

Query for documents where array size inside array is greater than 1

I am trying to find if any documents present and size more than one for a list which is inside two other lists in Mongo.
this is how my collection looks like:
{
"value": {
"items": [
{
"docs": [
{
"numbers": [
1,
2
]
},
{
"numbers": [
1
]
}
]
}
]
}
}
I tried to use this query and it did not work:
db.getCollection('MyCollection').find({"value.items.docs.numbers":{ $exists: true, $gt: {$size: 1} }})
What should be the ideal query to search if more than one item present inside list of list.
You are checking condition in nested array, for that nested $elemMatch condition will help to check conditions
$size allow only number as input so $not will help in negative condition
$ne to check array size should not [] empty
db.getCollection('MyCollection').find({
"value.items": {
$elemMatch: {
docs: {
$elemMatch: {
numbers: {
$exists: true,
$ne: [],
$not: {
$size: 1
}
}
}
}
}
}
})
Playground

Reference value from positional element in array in update

Suppose I have a document that looks like this:
{
"id": 1,
"entries": [
{
"id": 100,
"urls": {
"a": "url-a",
"b": "url-b",
"c": "url-c"
},
"revisions": []
}
]
}
I am trying to add a new object to the revisions array that contains its own urls field. Two of the fields should be copied from the entry's urls, while the last one will be new. The result should look like this:
{
"id": 1,
"entries": [
{
"id": 100,
"urls": {
"a": "url-a",
"b": "url-b",
"c": "url-c"
},
"revisions": [
{
"id": 1000,
"urls": {
"a": "url-a", <-- copied
"b": "url-b", <-- copied
"c": "some-new-url" <-- new
}
}
]
}
]
}
I am on MongoDB 4.2+, so I know I can use $property on the update query to reference values. However, this does not seem to be working as I expect:
collection.updateOne(
{
id: 1,
"enntries.id": 100
},
{
$push: {
"entries.$.revisions": {
id: 1000,
urls: {
"a": "$entries.$.urls.a",
"b": "$entries.$.urls.b",
"c": "some-new-url"
}
}
}
}
);
The element gets added to the array, but all I see for the url values is the literal $entries.$.urls.a. value I suspect the issue is with combining the reference with selecting a specific positional array element. I have also tried using $($entries.$.urls.a), with the same result.
How can I make this work?
Starting from MongoDB version >= 4.2 you can use aggregation pipeline in updates which means your update part of query will be wrapped in [] where you can take advantage of executing aggregation in query & also use existing field values in updates.
Issue :
Since you've not wrapped update part in [] to say it's an aggregation pipeline, .updateOne() is considering "$entries.$.urls.a" as a string. I believe you'll not be able to use $ positional operator in updates which use aggregation pipeline.
Try below query which uses aggregation pipeline :
collection.updateOne(
{
id: 1,
"entries.id": 100 /** "entries.id" is optional but much needed to avoid execution of below aggregation for doc where `id :1` but no `"entries.id": 100` */,
}
[
{
$set: {
entries: {
$map: { // aggregation operator `map` iterate over array & creates new array with values.
input: "$entries",
in: {
$cond: [
{ $eq: ["$$this.id", 100] }, // `$$this` is current object in array iteration, if condition is true do below functionality for that object else return same object as is to array being created.
{
$mergeObjects: [
"$$this",
{
revisions: { $concatArrays: [ "$$this.revisions", [{ id: 1000, urls: { a: "$$this.urls.a", b: "$$this.urls.b", c: "some-new-url" } } ]] }
}
]
},
"$$this" // Returning same object as condition is not met.
]
}
}
}
}
}
]
);
$mergeObjects will replace existing revisions field in $$this (current) object with value of { $concatArrays: [ "$$this.revisions", { id: 1000, urls: { a: "$$this.urls.a", b: "$$this.urls.b", c: "some-new-url" } } ] }.
From the above field name revisions and as it being an array I've assumed there will multiple objects in that field & So we're using $concatArrays operator to push new objects into revisions array of particular entires object.
In any case, if your revisions array field does only contain one object make it as an object instead of array Or you can keep it as an array & use below query - We've removed $concatArrays cause we don't need to merge new object to existing revisions array as we'll only have one object every-time.
collection.update(
{
id: 1,
"entries.id": 100
}
[
{
$set: {
entries: {
$map: {
input: "$entries",
in: {
$cond: [
{ $eq: ["$$this.id", 100] },
{
$mergeObjects: [
"$$this",
{
revisions: [ { id: 1000, urls: { a: "$$this.urls.a", b: "$$this.urls.b", c: "some-new-url" } } ]
}
]
},
"$$this"
]
}
}
}
}
}
]
);
Test : Test your aggregation pipeline here : mongoplayground
Ref : .updateOne()
Note : If in any case .updateOne() throws in an error due to in-compatible client or shell, try this query with .update(). This execution of aggregation pipeline in updates helps to save multiple DB calls & can be much useful on arrays with less no.of elements.

Slow Mongo aggregate when using $sort and $limit in $facet

I am noticing huge performance differences in what appears to be same aggregate, at least conceptually. The tests were made on a simple collection structure, that has an _id and a name and a createdAt, but there 20 million of those. There is an index on createdAt. It's hosted on an mlab cluster, version is 3.6.9 WiredTiger.
I am trying to get a simple paging going using aggregate, I know I could use find and limit, but I like to add more elements to the pipeline, the example I give is very distilled.
db.getCollection("runnablecalls").aggregate([
{
$facet: {
docs: [
{ $sort: {createdAt: -1} },
{ $limit: 25 },
{ $skip: 0 },
],
page_info: [
{ $group: { _id: null, total: { $sum: 1 } }
}
],
}
}
])
That takes almost 40s. Now if I moved the $sort and $limit outside of the facet it takes 0.042s.
db.getCollection("runnablecalls").aggregate([
{ $sort: {createdAt: -1} },
{ $limit: 25 },
{
$facet: {
docs: [
{ $skip: 0 },
],
page_info: [
{
$group: { _id: null, total: { $sum: 1 } }
}
]}
},
])
The page_info facet makes no difference at the end, I can take it out without difference, I am just leaving it in because I like use it. I know how to solve the problem using two queries a count and an aggregate without a $facet. I just like to understand why this happens.
The first aggregation doesn't use an index. The second aggregation uses an index and filters first 25 docs before it enters $facet. You can add explain('executionStats') to see query plans and indexes usages. For example,
db.getCollection("runnablecalls").explain('executionStats').aggregate([
{
$facet: {
docs: [
{ $sort: {createdAt: -1} },
{ $limit: 25 },
{ $skip: 0 },
],
page_info: [
{ $group: { _id: null, total: { $sum: 1 } }
}
],
}
}
])

Resources