MongoDB Track data changes - database

I want to track changes on MongoDB Documents. The big Challenge is that MongoDB has nested Documents.
Example
[
{
"_id": "60f7a86c0e979362a25245eb",
"email": "walltownsend#delphide.com",
"friends": [
{
"name": "Hancock Nelson"
},
{
"name": "Owen Dotson"
},
{
"name": "Cathy Jarvis"
}
]
}
]
after the update/change
[
{
"_id": "60f7a86c0e979362a25245eb",
"email": "walltownsend#delphide.com",
"friends": [
{
"name": "Daphne Kline" //<------
},
{
"name": "Owen Dotson"
},
{
"name": "Cathy Jarvis"
}
]
}
]
This is a very basic example of a highly expandable real world use chase.
On a SQL Based Database, I would suggest some sort of this solution.
The SQL way
users
_id
email
60f7a8b28db7c78b57bbc217
cathyjarvis#delphide.com
friends
_id
user_id
name
0
60f7a8b28db7c78b57bbc217
Hancock Nelson
1
60f7a8b28db7c78b57bbc217
Suarez Burt
2
60f7a8b28db7c78b57bbc217
Mejia Elliott
after the update/change
users
_id
email
60f7a8b28db7c78b57bbc217
cathyjarvis#delphide.com
friends
_id
user_id
name
0
60f7a8b28db7c78b57bbc217
Daphne Kline
1
60f7a8b28db7c78b57bbc217
Suarez Burt
2
60f7a8b28db7c78b57bbc217
Mejia Elliott
history
_id
friends_id
field
preUpdate
postUpdate
0
0
name
Hancock Nelson
Daphne Kline
If there is an update and the change has to be tracked before the next update, this would work for NoSQL as well. If there is a second Update, we have a second line in the SQL database and it't very clear. On NoSQL, you can make a list/array of the full document and compare changes during the indexes, but there is very much redundant information which hasn't changed.

Have a look at Set Expression Operators
$setDifference
$setEquals
$setIntersection
Be ware, these operators perform set operation on arrays, treating arrays as sets. If an array contains duplicate entries, they ignore the duplicate entries. They ignore the order of the elements.
In your example the update would result in
removed: [ {name: "Hancock Nelson" } ],
added: [ {name: "Daphne Kline" } ]
If the number of elements is always the same before and after the update, then you could use this one:
db.collection.insertOne({
friends: [
{ "name": "Hancock Nelson" },
{ "name": "Owen Dotson" },
{ "name": "Cathy Jarvis" }
],
updated_friends: [
{ "name": "Daphne Kline" },
{ "name": "Owen Dotson" },
{ "name": "Cathy Jarvis" }
]
})
db.collection.aggregate([
{
$set: {
difference: {
$map: {
input: { $range: [0, { $size: "$friends" }] },
as: "i",
in: {
$cond: {
if: {
$eq: [
{ $arrayElemAt: ["$friends", "$$i"] },
{ $arrayElemAt: ["$updated_friends", "$$i"] }
]
},
then: null,
else: {
old: { $arrayElemAt: ["$friends", "$$i"] },
new: { $arrayElemAt: ["$updated_friends", "$$i"] }
}
}
}
}
}
}
},
{
$set: {
difference: {
$filter: {
input: "$difference",
cond: { $ne: ["$$this", null] }
}
}
}
}
])

Related

MongoDB lookup (join) with field in double nested array

With a MongoDB collection name department with the following structure:
{
"_id":99,
"name":"Erick Kalewe",
"faculty":"Zazio",
"lecturers":[
{
"lecturerID":31,
"name":"Granny Kinton",
"email":"gkintonu#answers.com",
"imparts":[
{
"groupID":70,
"codCourse":99
}
]
},
{
"lecturerID":36,
"name":"Michale Dahmel",
"email":"mdahmelz#artisteer.com",
"imparts":[
{
"groupID":100,
"codCourse":60
}
]
}
]
}
and another collection group with this structure:
{
"_id":100,
"codCourse":11,
"language":"Romanian",
"max_students":196,
"students":[
{
"studentID":1
}
],
"classes":[
{
"date":datetime.datetime(2022, 5, 10, 4, 24, 19),
"cod_classroom":100
}
]
}
join them to get the following:
{
"_id":99,
"name":"Erick Kalewe",
"faculty":"Zazio",
"lecturers":[
{
"lecturerID":31,
"name":"Granny Kinton",
"email":"gkintonu#answers.com",
"imparts":[
{
"groupID":70,
"codCourse":99
}
]
},
{
"lecturerID":36,
"name":"Michale Dahmel",
"email":"mdahmelz#artisteer.com",
"imparts":[
{
"_id":100,
"codCourse":11,
"language":"Romanian",
"max_students":196,
"students":[
{
"studentID":1
}
],
"classes":[
{
"date":datetime.datetime(2022, 5, 10, 4, 24, 19),
"cod_classroom":100
}
]
}
]
}
]
}
The objective is to get a report with the number of students taught by a professor from a department.
Query
unwind, do the join, and re-group back
its kinda big query because you want to join in nested field, and this means 2 unwind and 2 groupings to restore the structure
(i think in general joining fields shouldn't go deep inside)
unwind both arrays
do the lookup on groupID
and now construct back the document as 2 level nested
first its impacts that need to be grouped and pushed
(for rest argument i keep the $first)
we sum also the students based on the comment
then its lecturers that i need to be grouped and pushed
(for rest arguments i keep the $first)
we take the lecture with the max students in the department
(mongodb can compare documents also)
Playmongo (you can put your mouse at the end of each stage to see in/out of that stage)
department.aggregate(
[{"$unwind": "$lecturers"}, {"$unwind": "$lecturers.imparts"},
{"$lookup":
{"from": "coll",
"localField": "lecturers.imparts.groupID",
"foreignField": "_id",
"as": "lecturers.imparts"}},
{"$set": {"lecturers.imparts": {"$first": "$lecturers.imparts"}}},
{"$group":
{"_id": {"_id": "$_id", "lecturersID": "$lecturers.lecturerID"},
"name": {"$first": "$name"},
"faculty": {"$first": "$faculty"},
"lecturers":
{"$first":
{"lecturerID": "$lecturers.lecturerID",
"name": "$lecturers.name",
"email": "$lecturers.email"}},
"imparts": {"$push": "$lecturers.imparts"},
"lecture_max_students":
{"$sum": "$lecturers.imparts.max_students"}}},
{"$set":
{"lecturers":
{"$mergeObjects":
["$lecturers", {"imparts": "$imparts"},
{"lecture_max_students": "$lecture_max_students"}]},
"imparts": "$$REMOVE","lecture_max_students": "$$REMOVE"}},
{"$group":
{"_id": "$_id._id",
"name": {"$first": "$name"},
"faculty": {"$first": "$faculty"},
"lectures": {"$push": "$lecturers"},
"dept-max-lecturer":
{"$max": {"max-students": "$lecturers.lecture_max_students",
"lecturerID": "$lecturers.lecturerID"}}}}])
You can try aggregation framework,
$lookup with group collection pass lecturers.imparts.groupID as localField and pass _id as foreignField
$addFields to merge group data with imports and remove group fields because it is not needed
$map to iterate loop of lecturers array
$mergeObjects to merge current object of lecturers and updated object of imports
$map to iterate loop of imports array
$mergeObjects to merge current object of imports and found result from group
$filter to iterate loop of group array and find the group by groupID
$arrayElemAt to get first element from above filtered result
db.department.aggregate([
{
$lookup: {
from: "group",
localField: "lecturers.imparts.groupID",
foreignField: "_id",
as: "group"
}
},
{
$addFields: {
lecturers: {
$map: {
input: "$lecturers",
in: {
$mergeObjects: [
"$$this",
{
imparts: {
$map: {
input: "$$this.imparts",
as: "i",
in: {
$mergeObjects: [
"$$i",
{
$arrayElemAt: [
{
$filter: {
input: "$group",
cond: { $eq: ["$$this._id", "$$i.groupID"] }
}
},
0
]
}
]
}
}
}
}
]
}
}
},
group: "$$REMOVE"
}
}
])
Playground
Now that we understand the question (according to your other question), an answer can be:
Add each department document a set of all its relevant groups.
$lookup only the student ids for each group to create a groups array.
Insert the relevant groups data to each lecturer.
Calculate maxImpartsStudents which is the number of unique students per lecturer from all of its groups
$reduce the lecturers array to include only the lecturer with highest maxImpartsStudents.
Format the answer
db.department.aggregate([
{
$addFields: {
groups: {
$setIntersection: [
{
$reduce: {
input: "$lecturers.imparts.groupID",
initialValue: [],
in: {$concatArrays: ["$$value", "$$this"]}
}
}
]
}
}
},
{
$lookup: {
from: "group",
let: {groupIDs: "$groups"},
pipeline: [
{$match: {$expr: {$in: ["$_id", "$$groupIDs"]}}},
{
$project: {
students: {
$reduce: {
input: "$students",
initialValue: [],
in: {$concatArrays: ["$$value", ["$$this.studentID"]]}
}
}
}
}
],
as: "groups"
}
},
{
$project: {
name: 1,
lecturers: {
$map: {
input: "$lecturers",
in: {
$mergeObjects: [
{lecturerID: "$$this.lecturerID"},
{groups: {
$map: {
input: "$$this.imparts",
in: {
$arrayElemAt: [
"$groups",
{$indexOfArray: ["$groups._id", "$$this.groupID"]}
]
}
}
}
}
]
}
}
}
}
},
{
$project: {
name: 1,
lecturers: {
$map: {
input: "$lecturers",
as: "item",
in: {
$mergeObjects: [
{
maxImpartsStudents: {
$size: {
$reduce: {
input: "$$item.groups",
initialValue: [],
in: {$setUnion: ["$$value", "$$this.students"]}
}
}
}
},
{lecturerID: "$$item.lecturerID"}
]
}
}
}
}
},
{
$set: {
lecturers: {
$reduce: {
input: "$lecturers",
initialValue: {
"maxImpartsStudents": 0
},
in: {
$cond: [
{$gte: ["$$this.maxImpartsStudents", "$$value.maxImpartsStudents"]},
"$$this", "$$value"
]
}
}
}
}
},
{
$project: {
lecturerID: "$lecturers.lecturerID",
maxImpartsStudents: "$lecturers.maxImpartsStudents",
departmentName: "$name"
}
}
])
Which is much better than combining the solutions from both questions.
See how it works on the playground example

How to push a new element into existing array or create one if it doesn't exist yet in MongoDb?

I have a script creating a document, updating it and cleaning up.
db.getCollection('things').insert( { _id: 1001,
elemo: { a: "A", b: "B" },
histo: [ ] } } )
db.getCollection('things').update( { _id: 1001 },
[ { $set: {
histo: { $concatArrays: [ "$histo", ["$elemo"] ] } } } ] )
db.getCollection("things").find({ _id: 1001})
db.getCollection('things').remove({ _id: 1001 })
For certain reasons, I'd like to retain the functionality but can't guarantee that the originally empty array actually exists. I need to perform my update in such a way so that an existing array will get an additional element, while a non-existing (yet) one will get created (including said element).
db.getCollection('things').insert( { _id: 1001,
elemo: { a: "A", b: "B" } } )
db.getCollection('things').update( { _id: 1001 },
[ { $set: {
histo: { $concatArrays: [ "$histo", ["$elemo"] ] } } } ] )
db.getCollection("things").find({ _id: 1001})
db.getCollection('things').remove({ _id: 1001 })
The above only creates the field but its value is null, and so additional amendments to it result in null. I'm rather certain that it needs something more around $concatArrays but I can't figure out what. First, I thought I could go $ifnull but it didn't recognize that command (no error, no insertion, no coalescing, nothing).
You can make use of $cond or $ifNull (as you guessed) to check if the key exists or not inside the $concatArrays operator.
Using $cond Method
db.collection.update({
_id: 1001
},
[
{
$set: {
histo: {
"$concatArrays": [
{
"$cond": {
"if": {
"$not": [
"$histo"
]
},
"then": [],
"else": "$histo",
}
},
[
"$elemo"
],
],
}
}
}
])
Mongo Playground Sample Execution
Using $ifNull Method
db.collection.update({
_id: 1001
},
[
{
$set: {
histo: {
"$concatArrays": [
{
"$ifNull": [
"$histo",
[]
],
},
[
"$elemo"
],
],
}
}
}
])
Mongo Playground Sample Execution

Mongodb user table with friends

I have a USER table with documents:
{
_id: 1,
name: 'funny-guy43',
image: '../../../img1.jpg',
friends: [2, 3]
},
{
_id: 2,
name: 'SurfinGirl3',
image: '../../../img2.jpg',
friends: []
},
{
_id: 3,
name: 'FooBarMan',
image: '../../../img3.jpg',
friends: [2]
}
friends is an array of USER _ids. (1) I want to get user by _id, (2) look at his friends and (3) query the USER table with the friend ids to return all friends.
for example, find user 1, query the table based on his friends 2 and 3, and return 2 and 3.
Can I do that in one transaction? Or do I query the table to get user array of friends, then query the table again with array of friends ids.
I'm using .Net Core if that matters.
I am very open to alternative approaches as well.
It is, in fact, possible to do this in one transaction. Or, to be more exact, in one aggregation.
I would first split the users into 2 different subsets, one called searched_user and the other other_users, where searched_user will have only the user we are searching for and other_users will have everyone else. We can do that using $facet. Here is the idea:
{
"$facet": {
"searched_user": [
{
$match: {
_id: 1
}
}
],
"other_users": [
{
$match: {
_id: {
$ne: 1
}
}
}
]
}
}
Once they are separated like this, we can search the other_users subset using the friend ids from the searched_user. So here is the full aggregation:
db.collection.aggregate([
{
"$facet": {
"searched_user": [
{
$match: {
_id: 1
}
}
],
"other_users": [
{
$match: {
_id: {
$ne: 1
}
}
}
]
}
},
{
"$unwind": "$searched_user"
},
{
$project: {
user_friends: {
$filter: {
input: "$other_users",
as: "other_users",
cond: {
$in: [
"$$other_users._id",
"$searched_user.friends"
]
}
}
}
}
}
])
Here we are looking for user 1 and the result will be user 1's friends.
[
{
"user_friends": [
{
"_id": 2,
"friends": [],
"image": "../../../img2.jpg",
"name": "SurfinGirl3"
},
{
"_id": 3,
"friends": [
2
],
"image": "../../../img3.jpg",
"name": "FooBarMan"
}
]
}
]
Playground: https://mongoplayground.net/p/-8pNnQXg8r6
You can achieve this by using lookup in aggregation, Tried it with MongoDB version v4.2.11.
db.users.aggregate([
{
'$match': {
'_id': 1,
}
},
{
'$lookup': {
'from' : 'users',
'let' : {
'friendIds': '$friends',
},
'pipeline': [
{
'$match':{
'$expr': {'$in': [ '$_id', '$$friendIds']}
}
}
],
'as': 'friendsArr'
}
}
])
Result:
[
{
"_id" : 1,
"name" : "funny-guy43",
"image" : "../../../img1.jpg",
"friends" : [
2,
3
],
"friendsArr" : [
{
"_id" : 2,
"name" : "SurfinGirl3",
"image" : "../../../img2.jpg",
"friends" : [ ]
},
{
"_id" : 3,
"name" : "FooBarMan",
"image" : "../../../img3.jpg",
"friends" : [
2
]
}
]
}
]

MongoDB - How to get all documents not being referenced by any document in a different collection

We have two collections, Teams and Matches. Every time a Match is reported, a new document is saved in that collection and its added to an array in the Team documents (teams[i].matches).
A now solved bug in our system has caused that the new Matches document were not referenced in their respectives Teams documents.
Is there a query for Mongo DB 3.6.9 that can help us find the Matches not referenced in Teams?
An aggregation pipeline may help you, using $lookup.
$lookup fetches documents from "Teams" that match the pipeline's $match.
let: { match_id: "$_id" } create a variable match_id corresponding to Match's _id.
$match expression keeps only Teams with match_id into Team's matches array.
as: "matches" stores Team that validate previous $match.
Last $match after $lookup step keeps matches array that are empty (Matches with no Teams)
db.Matches.aggregate([
{
$lookup: {
from: "Teams",
let: { match_id: "$_id" },
pipeline: [{
$match: {
$expr: {
$in: [ "$$match_id", "$matches" ]
}
}
}],
as: "matches"
},
},
{
$match: {
$expr: { $eq: [{ $size: "$matches" }, 0] }
}
}
]);
This has been tested with the following collection template and Mongo playground online editor :
db={
"Matches": [
{ "_id": 0 },
{ "_id": 1 },
{ "_id": 2 },
{ "_id": 3 },
{ "_id": 4 },
],
"Teams": [
{
"_id": 0,
matches: [ 0, 3 ],
},
{
"_id": 1,
matches: [],
},
{
"_id": 2,
matches: [ 0 ],
},
{
"_id": 3,
matches: [ 2 ],
}
]
}
The resulting output is :
[
{
"_id": 1,
"matches": []
},
{
"_id": 4,
"matches": []
}
]

Avoid empty array elements in mongo db

How to avoid empty array while filtering results while querying a collection in MongoDb
[
{
"_id": ObjectId("5d429786bd7b5f4ae4a64790"),
"extensions": {
"outcome": "success",
"docType": "ABC",
"Roll No": "1"
},
"data": [
{
"Page1": [
{
"heading": "LIST",
"content": [
{
"text": "<b>12345</b>"
},
],
}
],
"highlights": [
{
"name": "ABCD",
"text": "EFGH",
}
],
"marks": [
{
"revision": "revision 1",
"Score": [
{
"maths": "100",
"science": "40",
"history": "90"
},
{
"lab1": "25",
"lab2": "25"
}
],
"Result": "Pass"
},
{
"revision": "revision 1",
"Score": [
{
"maths": "100",
"science": "40"
},
{
"lab1": "25",
"lab2": "25"
}
],
"Result": "Pass"
}
]
}
]
}
]
I am looking for results that has only "history" marks in the score array.
I tried the following query (in mongo 3.6.10) but it returns empty score array as well the array that has history as well
db.getCollection('student_scores').find({
"data.marks.score.history": {
$not: {
$type: 10
},
$exists: true
}
},
{
"extensions.rollNo": 1,
"data.marks.score.history": 1
})
Desired output is
{
"extensions": {
"rollNo": "1"
},
"data": [
{
"marks": [
{
"Score": [
{
"history": "90"
}
]
}
]
}
]
}
I used something like the following;
db.getCollection('student_scores').aggregate([
{
$unwind: "$data"
},
{
$unwind: "$data.marks"
},
{
$unwind: "$data.marks.Score"
},
{
$match: {
"data.marks.Score.history": {
$exists: true,
$not: {
$type: 10
}
}
}
},
{
$project: {
"extensions.Roll No": 1,
"data.marks.Score.history": 1
}
},
{
$group: {
_id: "$extensions.Roll No",
history_grades: {
$push: "$data.marks.Score.history"
}
}
}
])
where I got the following result with your input (I think more readable than your expected output);
[
{
"_id": "1",
"history_grades": [
"90"
]
}
]
where _id represents "extensions.Roll No" value for any given data set.
What do you think?
check with a bigger input on mongoplayground
OK, so I still think the data design here with the Score array is a little off but here is solution that will ensure that a Score array contains only 1 entry and that entry is for a key of history. We use dotpath array diving as a trick to get to the value of history.
c = db.foo.aggregate([
{$unwind: "$data"}
,{$unwind: "$data.marks"}
,{$project: {
result: {$cond: [
{$and: [ // if
{$eq: [1, {$size: "$data.marks.Score"}]}, // Only 1 item...
// A little trick! $data.marks.Score.history will resolve to an *array*
// of the values associated with each object in $data.marks.Score (the parent
// array) having a key of history. BUT: As it resolves, if there is no
// field for that key, nothing is added to resolution vector -- not even a null.
// This means the resolved array could
// be **shorter** than the input. FOr example:
// > db.foo.insert({"x":[ {b:2}, {a:3,b:4}, {b:7}, {a:99} ]});
// WriteResult({ "nInserted" : 1 })
// > db.foo.aggregate([ {$project: {z: "$x.b", n: {$size: "$x.b"}} } ]);
// { "z" : [ 2, 4, 7 ], "n" : 3 }
// > db.foo.aggregate([ {$project: {z: "$x.a", n: {$size: "$x.a"}} } ]);
// { "z" : [ 3, 99 ], "n" : 2 }
//
// You must be careful about this.
// But we also know this resolved vector is of size 1 (see above) so we can go ahead and grab
// the 0th item and that becomes our output.
// Note that if we did not have the requirement of ONLY history, then we would not
// need the fancy $cond thing.
{$arrayElemAt: ["$data.marks.Score.history",0]}
]},
{$arrayElemAt: ["$data.marks.Score.history",0]}, // then (use value of history)
null ] } // else set null
,extensions: "$extensions" // just carry over extensions
}}
,{$match: {"result": {$ne: null} }} // only take good ones.

Resources