MongoDB Aggregate Two Collection - database

I have two collections. Main - represented by larger mongo document, and second one is smaller. So, both of collections have at least one semanticity similar field - name and title_en. Is it possible to aggregate this two collection into one, using mongoDB aggregation?
I guess in pseudo query it will something like:
APPEND to Collection1_DOC (field_name: field_value) from Collection2_DOC
WHERE Collection1_DOC.title_en = Collection2_DOC.Name
Is mongo aggregation provide this kind of functionality?
{
"age_rating":"R",
"age_rating_guide":"17+ (violence & profanity)",
"average_rating":"82.47",
"episode_count":26,
"episode_length":25,
"poster_image":"https://media.kitsu.io/anime/poster_images/1/original.jpg?1597604210",
"show_type":"TV",
"title_en":"Cowboy Bebop",
"title_ja_jp":"カウボーイビバップ",
"total_length":650
}
{
"End_year":1999,
"Name":"Cowboy Bebop",
"Release_season":"Spring",
"Release_year":1998,
"Tags":"Action, Adventure, Drama, Sci Fi, Bounty Hunters, Episodic, Noir, Outer Space, Western, Original Work, Drug Use,, Mature Themes,, Nudity,, Violence"
}

I find out more about aggregation. I want to share pipeline stage, that kinda work for me.
db.main.aggregate([
{
"$lookup": {
"from": "info",
"localField": "Name",
"foreignField": "title_en",
"as": "linked_collections"
}
},
{
"$unwind": "$linked_collections"
},
{
"$project": {
"age_rating": 1,
"age_rating_guide": 1,
"average_ratingr": 1,
"description": 1,
"episode_count": 1,
"episode_length": 1,
"poster_image": 1,
"show_type": 1,
"title_en": 1,
"title_ja_jp": 1,
"total_length": 1,
"release_season": "$linked_collections.Release_season",
"Release_year": "$linked_collections.Release_year",
"Tags": "$linked_collections.Tags"
}
},
{
$out: "main_new"
}
])
Unfortunatly, $out stage exit with error. But its steel the right way, as i see
caused by :: E11000 duplicate key error
collection: anime_app.tmp.agg_out.c0db80b8-2a40-40cb-84ec-5b8c3c281ca2
index: _id_
dup key: { _id: ObjectId('62da48e463f67a0586394026') }
Update
This error can be removed by adding this one line of code. It also removed all duplicate. Attaching answer from MongoDB engineer
{
"_id":0,
"age_rating": 1,
}

Related

Is there a way to sort MongoDB records by the highest difference between two values in an object?

I have a database in which records look like this:
{
id: someId
initialValue: 100
currentValue: 150
creationDate: someDate
}
And I have to get values that are the biggest in terms of difference between currentValue and initialValue. Is it possible in MongoDB to write a sorting function that will substract one value from another and then compare (sort) them?
One of the slightly overkill but extensible solution is to use $setWindowFields to $rank the $subtract sum. You can manipulate the rank field by setting other functions if changes needed to be made in future. The out-of-the-box behaviour also provides a tie-breaker functionality.
db.collection.aggregate([
{
"$addFields": {
"diff": {
"$subtract": [
"$currentValue",
"$initialValue"
]
}
}
},
{
"$setWindowFields": {
"partitionBy": null,
"sortBy": {
"diff": -1
},
"output": {
"rank": {
$rank: {}
}
}
}
},
{
$sort: {
rank: 1
}
},
// cosmetics
{
"$unset": [
"diff",
"rank"
]
}
])
Mongo Playground
Sure, simply generate the desired value to sort on in a preceding $addFields stage first:
{
$addFields: {
diff: {
"$subtract": [
"$currentValue",
"$initialValue"
]
}
}
},
{
$sort: {
diff: -1
}
}
Playground example here
Note that this operation cannot use an index so will have to manually sort the data. This may be a heavy and/or slow operation. You may wish to consider persisting the value when you write to the documents and index it. This would slightly increase the write overhead, but would significantly reduce the amount of work and time required to perform the reads.

Remove oldest N elements from document array

I have a document in my mongodb that contains a very large array (about 10k items). I'm trying to only keep the latest 1k in the array (and so remove the first 9k elements). The document looks something like this:
{
"_id" : 'fakeid64',
"Dropper" : [
{
"md5" : "fakemd5-1"
},
{
"md5" : "fakemd5-2"
},
...,
{
"md5": "fakemd5-10000"
}
]
}
How do I accomplish that?
The correct operation to do here actually involves the $push operator using the $each and $slice modifiers. The usage may initially appear counter-intuitive that you would use $push to "remove" items from an array, but the actual use case is clear when you see the intended operation.
db.collection.update(
{ "_id": "fakeid64" },
{ "$push": { "Dropper": { "$each": [], "$slice": -1000 } }
)
You can in fact just run for your whole collection as:
db.collection.update(
{ },
{ "$push": { "Dropper": { "$each": [], "$slice": -1000 } },
{ "multi": true }
)
What happens here is that the modifier for $each takes an array of items to "add" in the $push operation, which in this case we leave empty since we do not actually want to add anything. The $slice modifier given a "negative" value is actually saying to keep the "last n" elements present in the array as the update is performed, which is exactly what you are asking.
The general "intended" case is to use $slice when adding new elements to "maintain" the array at a "maximum" given length, which in this case would be 1000. So you would generally use in tandem with actually "adding" new items like this:
db.collection.update(
{ "_id": "fakeid64" },
{ "$push": { "Dropper": { "$each": [{ "md5": "fakemd5-newEntry"}], "$slice": -1000 } }
)
This would append the new item(s) provided in $each whilst also removing any items from the "start" of the array where the total length given the addition was greater than 1000.
It is stated incorrectly elsewhere that you would use $pullAll with a supplied list of the array content already existing in the document, but the operation is actually two requests to the database.
The misconception being that the request is sent as "one", but it actually is not and is basically interpreted as the longer form ( with correct usage of .slice() ):
var md5s = db.collection.findOne({ "_id": "fakeid64" }).Dropper.slice(-1000);
db.collection.update(
{ "_id": "fakeid64" },
{ "$pullAll": { "Dropper": md5s } }
)
So you can see that this is not very efficient and is in fact quite dangerous when you consider that the state of the array within the document "could" possibly change in between the "read" of the array content and the actual "write" operation on update since they occur separately.
This is why MongoDB has atomic operators for $push with $slice as is demonstrated. Since it is not only more efficient, but also takes into consideration the actual "state" of the document being modified at the time the actual modification occurs.
you can use $pullAll operator
suppose you use python/pymongo driver:
yourcollection.update_one(
{'_id': fakeid64},
{'$pullAll': {'Dropper': yourcollection.find_one({'_id': 'fakeid64'})['Dropper'][:9000]}}
)
or in mongo shell:
db.yourcollection.update(
{ _id: 'fakeid64'},
{$pullAll: {'Dropper': db.yourcollection.findOne({'_id' : 'fakeid64'})['Dropper'].slice(0,9000)}}
)
(*) having saying that it would be much better if you didn't allow your document(s) to grow this much in first place
This is just a representation of query. Basically you can unwind with limit and skip, then use cursor foreach to remove the items like below :
db.your_collection.aggregate([
{ $match : { _id : 'fakeid64' } },
{ $unwind : "$Dropper"},
{ $skip : 1000},
{ $limit : 9000}
]).forEach(function(doc){
db.your_collection.update({ _id : doc._id}, { $pull : { Dropper : doc.Dropper} });
});
from mongo docs
db.students.update(
{ _id: 1 },
{
$push: {
scores: {
$each: [ { attempt: 3, score: 7 }, { attempt: 4, score: 4 } ],
$sort: { score: 1 },
$slice: -3
}
}
}
)
The following update uses the $push operator with:
the $each modifier to append to the array 2 new elements,
the $sort modifier to order the elements by ascending (1) score, and
the $slice modifier to keep the last 3 elements of the ordered array.

MongoDB - Query on the last element of an array?

I know that MongoDB supports the syntax find{array.0.field:"value"}, but I specifically want to do this for the last element in the array, which means I don't know the index. Is there some kind of operator for this, or am I out of luck?
EDIT: To clarify, I want find() to only return documents where a field in the last element of an array matches a specific value.
In 3.2 this is possible. First project so that myField contains only the last element, and then match on myField.
db.collection.aggregate([
{ $project: { id: 1, myField: { $slice: [ "$myField", -1 ] } } },
{ $match: { myField: "myValue" } }
]);
You can use $expr ( 3.6 mongo version operator ) to use aggregation functions in regular query.
Compare query operators vs aggregation comparison operators.
For scalar arrays
db.col.find({$expr: {$gt: [{$arrayElemAt: ["$array", -1]}, value]}})
For embedded arrays - Use $arrayElemAt expression with dot notation to project last element.
db.col.find({$expr: {$gt: [{"$arrayElemAt": ["$array.field", -1]}, value]}})
Spring #Query code
#Query("{$expr:{$gt:[{$arrayElemAt:[\"$array\", -1]}, ?0]}}")
ReturnType MethodName(ArgType arg);
Starting Mongo 4.4, the aggregation operator $last can be used to access the last element of an array:
For instance, within a find query:
// { "myArray": ["A", "B", "C"] }
// { "myArray": ["D"] }
db.collection.find({ $expr: { $eq: [{ $last: "$myArray" }, "C"] } })
// { "myArray": ["A", "B", "C"] }
Or within an aggregation query:
db.collection.aggregate([
{ $addFields: { last: { $last: "$myArray" } } },
{ $match: { last: "C" } }
])
use $slice.
db.collection.find( {}, { array_field: { $slice: -1 } } )
Editing:
You can make use of
{ <field>: { $elemMatch: { <query1>, <query2>, ... } } } to find a match.
But it won't give exactly what you are looking for. I don't think that is possible in mongoDB yet.
I posted on the official Mongo Google group here, and got an answer from their staff. It appears that what I'm looking for isn't possible. I'm going to just use a different schema approach.
Version 3.6 use aggregation to achieve the same.
db.getCollection('deviceTrackerHistory').aggregate([
{
$match:{clientId:"12"}
},
{
$project:
{
deviceId:1,
recent: { $arrayElemAt: [ "$history", -1 ] }
}
}
])
You could use $position: 0 whenever you $push, and then always query array.0 to get the most recently added element. Of course then, you wont be able to get the new "last" element.
Not sure about performance, but this works well for me:
db.getCollection('test').find(
{
$where: "this.someArray[this.someArray.length - 1] === 'pattern'"
}
)
You can solve this using aggregation.
model.aggregate([
{
$addFields: {
lastArrayElement: {
$slice: ["$array", -1],
},
},
},
{
$match: {
"lastArrayElement.field": value,
},
},
]);
Quick explanations. aggregate creates a pipeline of actions, executed sequentially, which is why it takes an array as parameter. First we use the $addFields pipeline stage. This is new in version 3.4, and basically means: Keep all the existing fields of the document, but also add the following. In our case we're adding lastArrayElement and defining it as the last element in the array called array. Next we perform a $match pipeline stage. The input to this is the output from the previous stage, which includes our new lastArrayElement field. Here we're saying that we only include documents where its field field has the value value.
Note that the resulting matching documents will include lastArrayElement. If for some reason you really don't want this, you could add a $project pipeline stage after $match to remove it.
For the answer use $arrayElemAt,if i want orderNumber:"12345" and the last element's value $gt than "value"? how to make the $expr? thanks!
For embedded arrays - Use $arrayElemAt expression with dot notation to project last element.
db.col.find({$expr: {$gt: [{"$arrayElemAt": ["$array.field", -1]}, value]}})
db.collection.aggregate([
{
$match: {
$and: [
{ $expr: { $eq: [{ "$arrayElemAt": ["$fieldArray.name", -1] }, "value"] } },
{ $or: [] }
]
}
}
]);

mongodb - adding the value in a field to the value in an embedded array

I have a document in MongoDB as below.
{
"CorePrice" : 1,
"_id" : 166,
"partno" : 76,
"parttype" : "qpnm",
"shipping" :
[
{
"shippingMethod1" : "ground",
"cost1" : "10"
},
{
"shippingMethod2" : "air",
"cost2" : "11"
},
{
"shippingMethod3" : "USPS",
"cost3" : "3"
},
{
"shippingMethod4" : "USPS",
"cost4" : 45
}
]
}
My goal is to add CorePrice (1) to cost4 (45) and retrieve the computed value as a new column "dpv". I tried using the below query. However I receive an error exception: $add only supports numeric or date types, not Array. I'm not sure why. Any kind of help will be greatly appreciated.
db.Parts.aggregate([
{
$project: {
partno: 1,
parttype: 1,
dpv: {$add: ["$CorePrice","$shipping.cost1"]}
}
},
{
$match: {"_id":{$lt:5}}
}
]);
When you refer to the field shipping.cost1 and shipping is an array, MongoDB does not know which entry of the shipping-array you are referring to. In your case there is only one entry in the array with a field cost1, but this can't be guaranteed. That's why you get an error.
When you are able to change your database schema, I would recommend you to turn shipping into an object with a field for each shipping-type. This would allow you to address these better. When this is impossible or would break some other use-case, you could try to access the array entry by numeric index (shipping.0.cost1).
Another thing you could try is to use the $sum-operator to create the sum of all shipping.cost1 fields. When there is only one element in the array with a field cost1, the result will be its value.
I am able to achieve this by divorcing the query into two as below.
var pipeline1 = [
{
"$unwind": "$shipping"
},
{
$project:{
partno:1,
parttype:1,
dpv:{
$add:["$CorePrice","$shipping.cost4"]
}
}
},
{
$match:{"_id":5}
}
];
R = db.tb.aggregate( pipeline );

MongoDB rename database field within array

I need to rename indentifier in this:
{ "general" :
{ "files" :
{ "file" :
[
{ "version" :
{ "software_program" : "MonkeyPlus",
"indentifier" : "6.0.0"
}
}
]
}
}
}
I've tried
db.nrel.component.update(
{},
{ $rename: {
"general.files.file.$.version.indentifier" : "general.files.file.$.version.identifier"
} },
false, true
)
but it returns: $rename source may not be dynamic array.
For what it's worth, while it sounds awful to have to do, the solution is actually pretty easy. This of course depends on how many records you have. But here's my example:
db.Setting.find({ 'Value.Tiers.0.AssetsUnderManagement': { $exists: 1 } }).snapshot().forEach(function(item)
{
for(i = 0; i != item.Value.Tiers.length; ++i)
{
item.Value.Tiers[i].Aum = item.Value.Tiers[i].AssetsUnderManagement;
delete item.Value.Tiers[i].AssetsUnderManagement;
}
db.Setting.update({_id: item._id}, item);
});
I iterate over my collection where the array is found and the "wrong" name is found. I then iterate over the sub collection, set the new value, delete the old, and update the whole document. It was relatively painless. Granted I only have a few tens of thousands of rows to search through, of which only a few dozen meet the criteria.
Still, I hope this answer helps someone!
Edit: Added snapshot() to the query. See why in the comments.
You must apply snapshot() to the cursor before retrieving any documents from the database.
You can only use snapshot() with unsharded collections.
From MongoDB 3.4, snapshot() function was removed. So if using Mongo 3.4+ ,the example above should remove snapshot() function.
As mentioned in the documentation there is no way to directly rename fields within arrays with a single command. Your only option is to iterate over your collection documents, read them and update each with $unset old/$set new operations.
I had a similar problem. In my situation I found the following was much easier:
I exported the collection to json:
mongoexport --db mydb --collection modules --out modules.json
I did a find and replace on the json using my favoured text editing utility.
I reimported the edited file, dropping the old collection along the way:
mongoimport --db mydb --collection modules --drop --file modules.json
Starting Mongo 4.2, db.collection.update() can accept an aggregation pipeline, finally allowing the update of a field based on its own value:
// { general: { files: { file: [
// { version: { software_program: "MonkeyPlus", indentifier: "6.0.0" } }
// ] } } }
db.collection.updateMany(
{},
[{ $set: { "general.files.file": {
$map: {
input: "$general.files.file",
as: "file",
in: {
version: {
software_program: "$$file.version.software_program",
identifier: "$$file.version.indentifier" // fixing the typo here
}
}
}
}}}]
)
// { general: { files: { file: [
// { version: { software_program: "MonkeyPlus", identifier: "6.0.0" } }
// ] } } }
Literally, this updates documents by (re)$setting the "general.files.file" array by $mapping its "file" elements in a "version" object containing the same "software_program" field and the renamed "identifier" field which contains what used to be the value of "indentifier".
A couple additional details:
The first part {} is the match query, filtering which documents to update (in this case all documents).
The second part [{ $set: { "general.files.file": { ... }}}] is the update aggregation pipeline (note the squared brackets signifying the use of an aggregation pipeline):
$set is a new aggregation operator which in this case replaces the value of the "general.files.file" array.
Using a $map operation, we replace all elements from the "general.files.file" array by basically the same elements, but with an "identifier" field rather than "indentifier":
input is the array to map.
as is the variable name given to looped elements
in is the actual transformation applied on elements. In this case, it replaces elements by a "version" object composed by a "software_program" and a "identifier" fields. These fields are populated by extracting their previous values using the $$file.xxxx notation (where file is the name given to elements from the as part).
I had to face the issue with the same schema. So this query will helpful for someone who wants to rename the field in an embedded array.
db.getCollection("sampledocument").updateMany({}, [
{
$set: {
"general.files.file": {
$map: {
input: "$general.files.file",
in: {
version: {
$mergeObjects: [
"$$this.version",
{ identifer: "$$this.version.indentifier" },
],
},
},
},
},
},
},
{ $unset: "general.files.file.version.indentifier" },
]);
Another Solution
I also would like rename a property in array: and I used thaht
db.getCollection('YourCollectionName').find({}).snapshot().forEach(function(a){
a.Array1.forEach(function(b){
b.Array2.forEach(function(c){
c.NewPropertyName = c.OldPropertyName;
delete c["OldPropertyName"];
});
});
db.getCollection('YourCollectionName').save(a)
});
The easiest and shortest solution using aggregate (Mongo 4.0+).
db.myCollection.aggregate([
{
$addFields: {
"myArray.newField": {$arrayElemAt: ["$myArray.oldField", 0] }
}
},
{$project: { "myArray.oldField": false}},
{$out: {db: "myDb", coll: "myCollection"}}
])
The problem using forEach loop as mention above is the very bad performance when the collection is huge.
My proposal would be this one:
db.nrel.component.aggregate([
{ $unwind: "$general.files.file" },
{
$set: {
"general.files.file.version.identifier": {
$ifNull: ["$general.files.file.version.indentifier", "$general.files.file.version.identifier"]
}
}
},
{ $unset: "general.files.file.version.indentifier" },
{ $set: { "general.files.file": ["$general.files.file"] } },
{ $out: "nrel.component" } // carefully - it replaces entire collection.
])
However, this works only when array general.files.file has a single document only. Most likely this will not always be the case, then you can use this one:
db.nrel.componen.aggregate([
{ $unwind: "$general.files.file" },
{
$set: {
"general.files.file.version.identifier": {
$ifNull: ["$general.files.file.version.indentifier", "$general.files.file.version.identifier"]
}
}
},
{ $unset: "general.files.file.version.indentifier" },
{ $group: { _id: "$_id", general_new: { $addToSet: "$general.files.file" } } },
{ $set: { "general.files.file": "$general_new" } },
{ $unset: "general_new" },
{ $out: "nrel.component" } // carefully - it replaces entire collection.
])

Resources