I have a USER table with documents:
{
_id: 1,
name: 'funny-guy43',
image: '../../../img1.jpg',
friends: [2, 3]
},
{
_id: 2,
name: 'SurfinGirl3',
image: '../../../img2.jpg',
friends: []
},
{
_id: 3,
name: 'FooBarMan',
image: '../../../img3.jpg',
friends: [2]
}
friends is an array of USER _ids. (1) I want to get user by _id, (2) look at his friends and (3) query the USER table with the friend ids to return all friends.
for example, find user 1, query the table based on his friends 2 and 3, and return 2 and 3.
Can I do that in one transaction? Or do I query the table to get user array of friends, then query the table again with array of friends ids.
I'm using .Net Core if that matters.
I am very open to alternative approaches as well.
It is, in fact, possible to do this in one transaction. Or, to be more exact, in one aggregation.
I would first split the users into 2 different subsets, one called searched_user and the other other_users, where searched_user will have only the user we are searching for and other_users will have everyone else. We can do that using $facet. Here is the idea:
{
"$facet": {
"searched_user": [
{
$match: {
_id: 1
}
}
],
"other_users": [
{
$match: {
_id: {
$ne: 1
}
}
}
]
}
}
Once they are separated like this, we can search the other_users subset using the friend ids from the searched_user. So here is the full aggregation:
db.collection.aggregate([
{
"$facet": {
"searched_user": [
{
$match: {
_id: 1
}
}
],
"other_users": [
{
$match: {
_id: {
$ne: 1
}
}
}
]
}
},
{
"$unwind": "$searched_user"
},
{
$project: {
user_friends: {
$filter: {
input: "$other_users",
as: "other_users",
cond: {
$in: [
"$$other_users._id",
"$searched_user.friends"
]
}
}
}
}
}
])
Here we are looking for user 1 and the result will be user 1's friends.
[
{
"user_friends": [
{
"_id": 2,
"friends": [],
"image": "../../../img2.jpg",
"name": "SurfinGirl3"
},
{
"_id": 3,
"friends": [
2
],
"image": "../../../img3.jpg",
"name": "FooBarMan"
}
]
}
]
Playground: https://mongoplayground.net/p/-8pNnQXg8r6
You can achieve this by using lookup in aggregation, Tried it with MongoDB version v4.2.11.
db.users.aggregate([
{
'$match': {
'_id': 1,
}
},
{
'$lookup': {
'from' : 'users',
'let' : {
'friendIds': '$friends',
},
'pipeline': [
{
'$match':{
'$expr': {'$in': [ '$_id', '$$friendIds']}
}
}
],
'as': 'friendsArr'
}
}
])
Result:
[
{
"_id" : 1,
"name" : "funny-guy43",
"image" : "../../../img1.jpg",
"friends" : [
2,
3
],
"friendsArr" : [
{
"_id" : 2,
"name" : "SurfinGirl3",
"image" : "../../../img2.jpg",
"friends" : [ ]
},
{
"_id" : 3,
"name" : "FooBarMan",
"image" : "../../../img3.jpg",
"friends" : [
2
]
}
]
}
]
Related
I want to track changes on MongoDB Documents. The big Challenge is that MongoDB has nested Documents.
Example
[
{
"_id": "60f7a86c0e979362a25245eb",
"email": "walltownsend#delphide.com",
"friends": [
{
"name": "Hancock Nelson"
},
{
"name": "Owen Dotson"
},
{
"name": "Cathy Jarvis"
}
]
}
]
after the update/change
[
{
"_id": "60f7a86c0e979362a25245eb",
"email": "walltownsend#delphide.com",
"friends": [
{
"name": "Daphne Kline" //<------
},
{
"name": "Owen Dotson"
},
{
"name": "Cathy Jarvis"
}
]
}
]
This is a very basic example of a highly expandable real world use chase.
On a SQL Based Database, I would suggest some sort of this solution.
The SQL way
users
_id
email
60f7a8b28db7c78b57bbc217
cathyjarvis#delphide.com
friends
_id
user_id
name
0
60f7a8b28db7c78b57bbc217
Hancock Nelson
1
60f7a8b28db7c78b57bbc217
Suarez Burt
2
60f7a8b28db7c78b57bbc217
Mejia Elliott
after the update/change
users
_id
email
60f7a8b28db7c78b57bbc217
cathyjarvis#delphide.com
friends
_id
user_id
name
0
60f7a8b28db7c78b57bbc217
Daphne Kline
1
60f7a8b28db7c78b57bbc217
Suarez Burt
2
60f7a8b28db7c78b57bbc217
Mejia Elliott
history
_id
friends_id
field
preUpdate
postUpdate
0
0
name
Hancock Nelson
Daphne Kline
If there is an update and the change has to be tracked before the next update, this would work for NoSQL as well. If there is a second Update, we have a second line in the SQL database and it't very clear. On NoSQL, you can make a list/array of the full document and compare changes during the indexes, but there is very much redundant information which hasn't changed.
Have a look at Set Expression Operators
$setDifference
$setEquals
$setIntersection
Be ware, these operators perform set operation on arrays, treating arrays as sets. If an array contains duplicate entries, they ignore the duplicate entries. They ignore the order of the elements.
In your example the update would result in
removed: [ {name: "Hancock Nelson" } ],
added: [ {name: "Daphne Kline" } ]
If the number of elements is always the same before and after the update, then you could use this one:
db.collection.insertOne({
friends: [
{ "name": "Hancock Nelson" },
{ "name": "Owen Dotson" },
{ "name": "Cathy Jarvis" }
],
updated_friends: [
{ "name": "Daphne Kline" },
{ "name": "Owen Dotson" },
{ "name": "Cathy Jarvis" }
]
})
db.collection.aggregate([
{
$set: {
difference: {
$map: {
input: { $range: [0, { $size: "$friends" }] },
as: "i",
in: {
$cond: {
if: {
$eq: [
{ $arrayElemAt: ["$friends", "$$i"] },
{ $arrayElemAt: ["$updated_friends", "$$i"] }
]
},
then: null,
else: {
old: { $arrayElemAt: ["$friends", "$$i"] },
new: { $arrayElemAt: ["$updated_friends", "$$i"] }
}
}
}
}
}
}
},
{
$set: {
difference: {
$filter: {
input: "$difference",
cond: { $ne: ["$$this", null] }
}
}
}
}
])
{
"_id" : ObjectId("5fa919a49bbe481d117506c9"),
"isDeleted" : 0,
"productId" : 31,
"references" : [
{
"_id" : ObjectId("5fa919a49bbe481d117506ca"),
"languageCode" : "en",
"languageId" : 1,
"productId" : ObjectId("5fa919a49bbe481d117506ba")
},
{
"_id" : ObjectId("5fa91cc7d7d52f1e389dee1f"),
"languageCode" : "ar",
"languageId" : 2,
"productId" : ObjectId("5fa91cc7d7d52f1e389dee1e")
}
],
"createdAt" : ISODate("2020-11-09T10:27:48.859Z"),
"updatedAt" : ISODate("2020-11-09T10:27:48.859Z"),
"__v" : 0
},
{
"_id" : ObjectId("5f9aab1d8e475489270ebe3a"),
"isDeleted" : 0,
"productId" : 21,
"references" : [
{
"_id" : ObjectId("5f9aab1d8e475489270ebe3b"),
"languageCode" : "en",
"languageId" : 1,
"productId" : ObjectId("5f9aab1c8e475489270ebe2d")
}
],
"createdAt" : ISODate("2020-10-29T11:44:29.852Z"),
"updatedAt" : ISODate("2020-10-29T11:44:29.852Z"),
"__v" : 0
}
This is my mongoDB collection in which i store the multilingual references to product collection. In productId are the references to product Collection. Now If we have ar in our request, then we will only have the productId of ar languageCode. If that languageCode does not exist then we will have en langCode productId.
For Example if the user pass ar then the query should return
"productId" : ObjectId("5fa91cc7d7d52f1e389dee1e")
"productId" : ObjectId("5f9aab1c8e475489270ebe2d")
I have tried using $or with $elemMatch but I am not able to get the desired result. Also i am thinking of using $cond. can anyone help me construct the query.
We can acheive
$facet helps to categorized the incoming documents
In the arArray, we get all documents which has"references.languageCode": "ar" (This document may or may not have en), then de-structure the references array, then selecting the "references.languageCode": "ar" only using $match. $group helps to get all productIds which belong to "references.languageCode": "ar"
In the enArray, we only get documents which have only "references.languageCode": "en". Others are same like arArray.
$concatArrays helps to concept both arArray,enArray arrays
$unwind helps to de-structure the array.
$replaceRoot helps to make the Object goes to root
Here is the mongo script.
db.collection.aggregate([
{
$facet: {
arAarray: [
{
$match: {
"references.languageCode": "ar"
}
},
{
$unwind: "$references"
},
{
$match: {
"references.languageCode": "ar"
}
},
{
$group: {
_id: "$_id",
productId: {
$addToSet: "$references.productId"
}
}
}
],
enArray: [
{
$match: {
$and: [
{
"references.languageCode": "en"
},
{
"references.languageCode": {
$ne: "ar"
}
}
]
}
},
{
$unwind: "$references"
},
{
$group: {
_id: "$_id",
productId: {
$addToSet: "$references.productId"
}
}
}
]
}
},
{
$project: {
combined: {
"$concatArrays": [
"$arAarray",
"$enArray"
]
}
}
},
{
$unwind: "$combined"
},
{
"$replaceRoot": {
"newRoot": "$combined"
}
}
])
Working Mongo playground
You can test this solution to see if it is useful for you question:
db.collection.aggregate([
{
$addFields: {
foundResults:
{
$cond: {
if: { $in: ["ar", "$references.languageCode"] }, then:
{
$filter: {
input: "$references",
as: "item",
cond: {
$and: [{ $eq: ["$$item.languageCode", 'ar'] },
]
}
}
}
, else:
{
$filter: {
input: "$references",
as: "item",
cond: {
$and: [{ $eq: ["$$item.languageCode", 'en'] },
]
}
}
}
}
}
}
},
{ $unwind: "$foundResults" },
{ $replaceRoot: { newRoot: { $mergeObjects: ["$foundResults"] } } },
{ $project: { _id: 0, "productId": 1 } }
])
This question already has answers here:
Find duplicate records in MongoDB
(10 answers)
Closed 2 years ago.
I have a DB with news articles, and I am trying to do a little DB cleaning. I want to find all duplicate documents, and the best way i think to accomplish this by using the url field. My documents are structured as follows:
{
_id:
author:
title:
description:
url:
urlToImage:
publishedAt:
content:
summarization:
source_id:
}
Any help is greatly appreciated
Assuming a collection documents with name (using name instead of url) field consisting duplicate values. I have two aggregations which return some output which can be used to do further processing. I hope you will find this useful.
{ _id: 1, name: "jack" },
{ _id: 2, name: "john" },
{ _id: 3, name: "jim" },
{ _id: 4, name: "john" }
{ _id: 5, name: "john" },
{ _id: 6, name: "jim" }
Note that "john" has 3 occurrances and "jim" has 2.
(1) This aggregation returns the names which have duplicates (more than one occurance):
db.collection.aggregate( [
{
$group: {
_id: "$name",
count: { $sum: 1 }
}
},
{
$group: {
_id: "duplicate_names",
names: { $push: { $cond: [ { $gt: [ "$count", 1 ] }, "$_id", "$DUMMY" ] } }
}
}
] )
The output:
{ "_id" : "duplicate_names", "names" : [ "john", "jim" ] }
(2) The following aggregation just returns the _id field values for the duplicate documents. For example, the name "jim" has _idvalues 3 and 6. The output has only the id's for the duplicate documents, i.e., 6.
db.colection.aggregate( [
{
$group: {
_id: "$name",
count: { $sum: 1 },
ids: { $push: "$_id" }
}
},
{
$group: {
_id: "duplicate_ids",
ids: { $push: { $slice: [ "$ids", 1, 9999 ] } }
}
},
{
$project: {
ids: {
$reduce: {
input: "$ids",
initialValue: [ ],
in: { $concatArrays: [ "$$this", "$$value" ] }
}
}
}
}
] )
The output:
{ "_id" : duplicate_ids", "ids" : [ 6, 4, 5 ] }
Given the following dataset of books with a related books list:
{ "_id" : 1, "related_books" : [ { book_id: 1 }, { book_id: 2 }, { book_id: 3 } ] } <-- this one
{ "_id" : 2, "related_books" : [ { book_id: 1 } }
{ "_id" : 3, "related_books" : [ { book_id: 3 }, { book_id: 2 } ] } <-- and this one
{ "_id" : 4, "related_books" : [ { book_id: 1 }, { book_id: 2 } ] }
I'm trying to get the list of books when _id === related_book.book_id, so in this case:
book 1: it contains a related_book with book_id = 1
book 3: it contains a related_book with book_id = 3
I've been trying to find my way with aggregate filters but I can't make it work with the check of a sub-document field:
db.books.aggregate([{
"$project": {
"selected_books": {
"$filter": {
"input": "$books",
"as":"book",
"cond": { "$in": ["$_id", "$$book.related_books.book_id" ]
}}}}}])
This is my solution to this problem:
db.getCollection("books").aggregate([{
$addFields: {
hasBookWithSameId: {
$reduce: {
input: "$related_books",
initialValue: false,
in: {$or: ["$$value", {$eq: ["$_id", "$$this.book_id"]}]}
}
}
}
},
{
$match: {
hasBookWithSameId: true
}
}])
In the first step I'm creating a field hasBookWithSameId that represents a boolean: true if there is a related book with same id, false otherwise. This is made using the reduce operator, which is a powerful tool for dealing with embedded arrays, it works by iterating over the array verifying if it has any related book with the same id as the parent.
At the end, I just match all the documents that have this property set to true.
Update:
There is a more elegant solution to this problem with just one aggregation step, using $map and $anyElementTrue
db.collection.aggregate({
$match: {
$expr: {
$anyElementTrue: {
$map: {
input: "$related_books",
in: {
$eq: ["$$this.book_id", "$_id"]
}
}
}
}
}
})
I'm currently trying to massage out counts from the mLab API for reasons I don't have control over. So I want to grab the data I need from there in one query so I can limit the amount of API calls.
Assuming that my data looks like this:
{
"_id": {
"$oid": "12345"
},
"dancer": "Beginner",
"pirate": "Advanced",
"chef": "Mid",
"beartamer": "Mid",
"swordsman": "Mid",
"total": "Mid"
}
I know I can do 6 queries with something similar to:
db.score.aggregate({"$group": { _id: {"total":"$total"}, count: {$sum:1} }} )
but how do I query to get the count for each key? I'd like to see something akin to:
{ "_id" : { "total" : "Advanced" }, "count" : 1 }
{ "_id" : { "total" : "Mid" }, "count" : 1 }
{ "_id" : { "total" : "Beginner" }, "count" : 4 }
{ "_id" : { "pirate" : "Advanced" }, "count" : 1 }
//...etc
The following should give you precisely what you want:
db.scores.aggregate({
$project: {
"_id": 0 // get rid of the "_id" field since we do not want to count it
}
}, {
$project: {
"doc": {
$objectToArray: "$$ROOT" // transform all documents into key-value pairs
}
}
}, {
$unwind: "$doc" // flatten the resulting array into separate documents
}, {
$group: {
"_id": "$doc", // group by distinct key-value combination
"count": { $sum: 1 } // count documents per bucket
}
}, {
$project: {
"_id": { // some more transformation magic to recreate the desired output structure
$mergeObjects: [
{ $arrayToObject: [ [ "$_id" ] ] },
{ "count": "$count" }
]
},
}
}, {
$replaceRoot: {
"newRoot": "$_id" // this moves the contents of the "_id" field to the root of the documents
}
})