how to find duplicate array value on document mongodb - arrays

I got question might be many of you can help me.
so I have data on mongodb.
first data
{
"name" : 'david'
contacts : [
{
"name" : 'john',
"phone" : '123456'
},
{
"name" : 'george',
"phone" : '0987654'
}
]
}
second data
{
"name" : 'anita',
"contacts" : [
{
"name" : 'harry',
"phone" : '123456'
},
{
"name" : 'kurita',
"phone" : '323434'
}
]
}
the problem is,
can I query to find data that have duplicate contacts.phone.
so the result whill show like this.
{
"name" : 'david',
"contacts" : [
{
"name" : 'john',
"phone" : '123456'
}
]
}
{
"name" : 'anita',
"contacts" : [
{
"name" : 'harry',
"phone" : '123456'
}
]
}
data john and anita will show because they have similar data on contacts.phone
sorry for my english btw,
I hope you all understand what I mean.
thank you so much

There are a few steps involved to get the results you need.
We are going to write an aggregate pipeline to get the work done.
First you need to unwind your array values with the following:
{
$unwind: "$contacts"
}
Doc: https://docs.mongodb.com/manual/reference/operator/aggregation/unwind/
This would result into:
[
{
"_id": ObjectId("5a934e000102030405000000"),
"contacts": {
"name": "john",
"phone": "123456"
},
"name": "david"
},
{
"_id": ObjectId("5a934e000102030405000000"),
"contacts": {
"name": "george",
"phone": "0987654"
},
"name": "david"
},
{
"_id": ObjectId("5a934e000102030405000001"),
"contacts": {
"name": "harry",
"phone": "123456"
},
"name": "anita"
},
{
"_id": ObjectId("5a934e000102030405000001"),
"contacts": {
"name": "kurita",
"phone": "323434"
},
"name": "anita"
}
]
This would be much easier for us to group by field.
Doc: https://docs.mongodb.com/manual/reference/operator/aggregation/group/
{
$group: {
_id: {
phone: "$contacts.phone"
},
name: {
$addToSet: "$name"
},
contacts: {
$addToSet: "$contacts.name"
},
count: {
$sum: 1
}
}
}
That gives the following output:
[
{
"_id": {
"phone": "323434"
},
"contacts": [
"kurita"
],
"count": 1,
"name": [
"anita"
]
},
{
"_id": {
"phone": "123456"
},
"contacts": [
"john",
"harry"
],
"count": 2,
"name": [
"david",
"anita"
]
},
{
"_id": {
"phone": "0987654"
},
"contacts": [
"george"
],
"count": 1,
"name": [
"david"
]
}
]
Based on the output we need to match the count greater then 1 like:
Doc: https://docs.mongodb.com/manual/reference/operator/aggregation/match/
{
$match: {
count: {
"$gt": 1
}
}
}
Result is:
[
{
"_id": {
"phone": "123456"
},
"contacts": [
"john",
"harry"
],
"count": 2,
"name": [
"david",
"anita"
]
}
]
The query would look like:
Doc: https://docs.mongodb.com/manual/reference/operator/aggregation-pipeline/
db.collection.aggregate([
{
$unwind: "$contacts"
},
{
$group: {
_id: {
phone: "$contacts.phone"
},
name: {
$addToSet: "$name"
},
contacts: {
$addToSet: "$contacts.name"
},
count: {
$sum: 1
}
}
},
{
$match: {
count: {
"$gt": 1
}
}
}
])
MongoPlayground: https://mongoplayground.net/p/qSvhcYyAcQO
I hope this gives you a small idea what is possible with the aggregation pipeline.
Update / fix
According to your requirements you wish to have 2 objects foreach name that has duplicate contacts then you could use unwind again after the match.
[
{
"_id": {
"phone": "123456"
},
"contacts": [
"harry",
"john"
],
"count": 2,
"name": "david"
},
{
"_id": {
"phone": "123456"
},
"contacts": [
"harry",
"john"
],
"count": 2,
"name": "anita"
}
]
Cheers, Kevin

Related

Mongo DB query to match a field1 and loop thru another field2 and get output as a single array with all fields of field2

Need help with mongo db query
Mondo db query - search for parents with state good and children with state bad or missing. output should be an array of all the children with state bad or missing from parents with good state
Below is the JSON list
[
{
"name": "parent-a",
"status": {
"state": "good"
},
"children": [
"child-1",
"child-2"
]
},
{
"name": "child-1",
"state": "good",
"parent": "parent-a"
},
{
"name": "child-2",
"state": {},
"parent": "parent-a"
},
{
"name": "parent-b",
"status": {
"state": "good"
},
"children": [
"child-3",
"child-4"
]
},
{
"name": "child-3",
"state": "good",
"parent": "parent-b"
},
{
"name": "child-4",
"state": "bad",
"parent": "parent-b"
},
{
"name": "parent-c",
"status": {
"state": "bad"
},
"children": [
"child-5",
"child-6"
]
},
{
"name": "child-5",
"state": "good",
"parent": "parent-c"
},
{
"name": "child-6",
"state": "bad",
"parent": "parent-c"
}
]
Expected output
"children": [
{
"name": "child-2",
"state": {}
},
{
"name": "child-4",
"state": "bad"
}
]
Any inputs would be appreciated. Thanks in advance :)
One option is to use $lookup* for this:
db.collection.aggregate([
{$match: {state: {$in: ["bad", {}]}}},
{$lookup: {
from: "collection",
localField: "parent",
foreignField: "name",
pipeline: [
{$match: {"status.state": "good"}}
],
as: "hasGoodParent"
}},
{$match: {"hasGoodParent.0": {$exists: true}}},
{$project: {name: 1, state: 1, _id: 0}}
])
See how it works on the playground example
*If your mongoDB version is lower than 5.0 you need to change the syntax a bit. Drop the localField and foreignField of the $lookup and replace with let and equality match on the pipeline
Here is an approach doing this all without a "$lookup" stage as performance usually suffers when involved. Basically we match all relevant children and parents and we group by the child id. if it has a parent (which means the parent has a "good" state, and a "child" which means the child has a "bad/{}" state then it's matched).
You should make sure you have the appropriate indexes to support the initial query.
Additionally I would personally recommend adding a boolean field on each document to mark wether it's a parent or a child. right now we have to use the field structure based on your input to mark this type but I would consider this a bad practice.
Another thing we did not discuss which doesn't seem possible from the current structure is recursion, can a child have children of it's own? Just some things to consider
db.collection.aggregate([
{
$match: {
$or: [
{
$and: [
{
"status.state": "good"
},
{
parent: {
$exists: false
}
},
{
"children.0": {
$exists: true
}
}
]
},
{
$and: [
{
"state": {
$in: [
"bad",
null,
{}
]
}
},
{
parent: {
$exists: true
}
}
]
}
]
}
},
{
$unwind: {
path: "$children",
preserveNullAndEmptyArrays: true
}
},
{
$addFields: {
isParent: {
$cond: [
{
$eq: [
null,
{
$ifNull: [
"$parent",
null
]
}
]
},
1,
0
]
}
}
},
{
$group: {
_id: {
$cond: [
"$isParent",
"$children",
"$name"
]
},
hasParnet: {
$sum: "$isParent"
},
hasChild: {
$sum: {
$subtract: [
1,
"$isParent"
]
}
},
state: {
"$mergeObjects": {
$cond: [
"$isParent",
{},
{
state: "$state"
}
]
}
}
}
},
{
$match: {
hasChild: {
$gt: 0
},
hasParnet: {
$gt: 0
}
}
},
{
$group: {
_id: null,
children: {
$push: {
name: "$_id",
state: "$state.state"
}
}
}
}
])
Mongo Playground

Performance issue running mongodb aggregation

I need to run a query that joins documents from two collections, I wrote an aggregation query but it takes too much time when running in the production database with many documents. Is there any way to write this query in a more efficient way?
Query in Mongo playground: https://mongoplayground.net/p/dLb3hsJHNYt
There are two collections users and activities. I need to run a query to get some users (from users collection), and also their last activity (from activities collection).
Database:
db={
"users": [
{
"_id": 1,
"email": "user1#gmail.com",
"username": "user1",
"country": "BR",
"creation_date": 1646873628
},
{
"_id": 2,
"email": "user2#gmail.com",
"username": "user2",
"country": "US",
"creation_date": 1646006402
}
],
"activities": [
{
"_id": 1,
"email": "user1#gmail.com",
"activity": "like",
"timestamp": 1647564787
},
{
"_id": 2,
"email": "user1#gmail.com",
"activity": "comment",
"timestamp": 1647564834
},
{
"_id": 3,
"email": "user2#gmail.com",
"activity": "like",
"timestamp": 1647564831
}
]
}
Inefficient Query:
db.users.aggregate([
{
// Get users using some filters
"$match": {
"$expr": {
"$and": [
{ "$not": { "$in": [ "$country", [ "AR", "CA" ] ] } },
{ "$gte": [ "$creation_date", 1646006400 ] },
{ "$lte": [ "$creation_date", 1648684800 ] }
]
}
}
},
{
// Get the last activity within the time range
"$lookup": {
"from": "activities",
"as": "last_activity",
"let": { "cur_email": "$email" },
"pipeline": [
{
"$match": {
"$expr": {
"$and": [
{ "$eq": [ "$email", "$$cur_email" ] },
{ "$gte": [ "$timestamp", 1647564787 ] },
{ "$lte": [ "$timestamp", 1647564834 ] }
]
}
}
},
{ "$sort": { "timestamp": -1 } },
{ "$limit": 1 }
]
}
},
{
// Remove users with no activity
"$match": {
"$expr": {
"$gt": [ { "$size": "$last_activity" }, 0 ] }
}
}
])
Result:
[
{
"_id": 1,
"country": "BR",
"creation_date": 1.646873628e+09,
"email": "user1#gmail.com",
"last_activity": [
{
"_id": 2,
"activity": "comment",
"email": "user1#gmail.com",
"timestamp": 1.647564788e+09
}
],
"username": "user1"
},
{
"_id": 2,
"country": "US",
"creation_date": 1.646006402e+09,
"email": "user2#gmail.com",
"last_activity": [
{
"_id": 3,
"activity": "like",
"email": "user2#gmail.com",
"timestamp": 1.647564831e+09
}
],
"username": "user2"
}
]
I'm more familiar with relational databases, so I'm struggling a little to run this query efficiently.
Thanks!

MongoDB get only selected elements from objects inside an array

What I have is a collection of documents in MongoDB that have the structure something like this
[
{
"userid": "user1",
"addresses": [
{
"type": "abc",
"street": "xyz"
},
{
"type": "def",
"street": "www"
},
{
"type": "hhh",
"street": "mmm"
},
]
},
{
"userid": "user2",
"addresses": [
{
"type": "abc",
"street": "ccc"
},
{
"type": "def",
"street": "zzz"
},
{
"type": "hhh",
"street": "yyy"
},
]
}
]
If I can give the "type" and "userid", how can I get the result as
[
{
"userid": "user2",
"type": "abc",
"street": "ccc",
}
]
It would also be great even if I can get the "street" only as the result. The only constraint is I need to get it in the root element itself and not inside an array
Something like this:
db.collection.aggregate([
{
$match: {
userid: "user1" , "address.type":"abc"
}
},
{
$project: {
userid: 1,
address: {
$filter: {
input: "$addresses",
as: "a",
cond: {
$eq: [
"$$a.type",
"abc"
]
}
}
}
}
},
{
$unwind: "$address"
},
{
$project: {
userid: 1,
street: "$address.street",
_id: 0
}
}
])
explained:
Filter only documents with the userid & addresess.type you need
Project/Filter only the addresses elements with the needed type
unwind the address array
project only the needed elements as requested
For best results create index on the { userid:1 } field or compound index on { userid:1 , address.type:1 } fields
playground
You should be able to use unwind, match and project as shown below:
db.collection.aggregate([
{
"$unwind": "$addresses"
},
{
"$match": {
"addresses.type": "abc",
"userid": "user1"
}
},
{
"$project": {
"_id": 0,
"street": "$addresses.street"
}
}
])
You can also duplicate the match step as the first step to reduce the number of documents to unwind.
Here is the playground link.
There is a similar question/answer here.

Search in Embedded Documents in MongoDB?

I have document as shown
[
{
"Users": [
{
"Name": "Kartikey Vaish",
"_id": "1",
},
{
"Name": "Witcher Proxima",
"_id": "2",
}
],
"_id": "12",
},
{
"Users": [
{
"Name": "Witcher Proxima",
"_id": "2",
},
{
"Name": "Saga",
"_id": "4",
}
],
"_id": "13",
}
]
I want to search for those documents whose Users array has that particular ID
For Example if
ID == 1 // should return
[
{
"Users": [
{
"Name": "Kartikey Vaish",
"_id": "1",
},
{
"Name": "Witcher Proxima",
"_id": "2",
}
],
"_id": "12",
}
]
ID == 2 // should return
[
{
"Users": [
{
"Name": "Kartikey Vaish",
"_id": "1",
},
{
"Name": "Witcher Proxima",
"_id": "2",
}
],
"_id": "12",
},
{
"Users": [
{
"Name": "Witcher Proxima",
"_id": "2",
},
{
"Name": "Saga",
"_id": "4",
}
],
"_id": "13",
}
]
ID == 4 // should return
[
{
"Users": [
{
"Name": "Kartikey Vaish",
"_id": "1",
},
{
"Name": "Saga",
"_id": "4",
}
],
"_id": "13",
}
]
As you can see from above my query should return only those objects whose "Users" array contains an object with given ID. I tried this but it doesn't work.
const chats = await Chats.find({
Users: { $elemMatch: { _id: "1" } },
});
// this returns an empty array
const chats = await Chats.find({
Users: { $elemMatch: { Name: "Kartikey Vaish" } },
});
// this returns
[
{
"Users": [
{
"Name": "Kartikey Vaish",
"_id": "1",
},
{
"Name": "Witcher Proxima",
"_id": "2",
}
],
"_id": "12",
}
]
What am I doing wrong here?
Is it something related to _id paramter?
EDIT:
My Chats Schema looks like this -
const Chats = mongoose.model(
"Chats",
new mongoose.Schema({
Users: {
type: Array,
required: true,
default: [],
},
})
);
Update your schema as shown below
const Chats = mongoose.model(
"Chats",
new mongoose.Schema({
Users: [{
_id: {
type: String,
required: true, // Include only if needed!
unique: true // Include only if needed!
},
Name: {
type: String,
index: true // Include only if needed!
}
}]
})
);
If you do not explicitly mention _id MongoDB will create _id field as ObjectId.

How to remove duplicate values inside a list array in MongoDB?

I have many records in one collection in MongoDB and this is 3 examples to remove only based one QUESTION match criteria.
{
"_id": {
"$oid": "5f0f561256efe82f5082252e"
},
"Item1": false,
"Item2": "",
"Item3": 1,
"Item4": [
{
"Name": "TYPE",
"Value": "QUESTION"
},
{
"Name": "QUESTION",
"Value": "What is your name?"
},
{
"Name": "CORRECT_ANSWER",
"Value": "1"
},
{
"Name": "ANSWER_1",
"Value": "name one"
},
{
"Name": "ANSWER_2",
"Value": "name two"
}
],
"Item5": [
10
],
"Item6": false
}
and another one to compare
{
"_id": {
"$oid": "5f0f561256efe82f5082252c"
},
"Item1": false,
"Item2": "",
"Item3": 2,
"Item4": [
{
"Name": "TYPE",
"Value": "QUESTION"
},
{
"Name": "QUESTION",
"Value": "What is your name?"
},
{
"Name": "CORRECT_ANSWER",
"Value": "1"
},
{
"Name": "ANSWER_1",
"Value": "name one"
},
{
"Name": "ANSWER_2",
"Value": "name two"
}
],
"Item5": [
10
],
"Item6": false
}
the third one :
{
"_id": {
"$oid": "5f0f561256efe82f5082252d"
},
"Item1": false,
"Item2": "",
"Item3": 3,
"Item4": [
{
"Name": "TYPE",
"Value": "QUESTION"
},
{
"Name": "QUESTION",
"Value": "What is your last name?"
},
{
"Name": "CORRECT_ANSWER",
"Value": "1"
},
{
"Name": "ANSWER_1",
"Value": "name one"
},
{
"Name": "ANSWER_2",
"Value": "name two"
}
],
"Item5": [
10
],
"Item6": false
}
What I'm trying here is to make query with aggregation approach and I only want to focus on Item4 for exactly ("Name": "QUESTION") and the value (the question) for identifying the duplication.
The idea is to looking for duplication in the the question itself only ("What is your name?") in our example here. and I don't want to specify witch question because there are long list of them.
I'm looking just for the duplicated questions no mater what is the question look like.
I used the following approach but still I cannot narrow down the output to be only related to question and its value in order to delete the duplicate in the another step.
db.collections.aggregate([{ $unwind: "$Item4" }, {$group: { _id: { QUESTION: "$Item4.Name.4", Value: "$Item4.Value.4" }}}]).pretty()
I'm executing from mongo shell directly.
The following aggregation will list all the documents (the _ids) which have the duplicates of "Item4.Value" for the condition "Item4.Name": "QUESTION".
db.test.aggregate( [
{
$unwind: "$Item4"
},
{
$match: { "Item4.Name": "QUESTION" }
},
{
$group: {
_id: { "Item4_Value": "$Item4.Value" },
ids: { $push: "$_id" }
}
},
{
$match: { $expr: { $gt: [ { $size: "$ids" }, 1 ] } }
}
] )
It works! thanks a lot. I add it to the rest of code as below :
db.test.find().count()
const duplicatesIds = [];
db.test.aggregate( [
{
$unwind: "$Item4"
},
{
$match: { "Item4.Name": "QUESTION" } //here is the trick...to filter the array to pass only the condition "Item4.Name": "QUESTION".
},
{
$group: {
_id: { "Item4_Value": "$Item4.Value" },
ids: { $push: "$_id" }
}
}
],
{
allowDiskUse: true
}
).forEach(function (doc) {
doc.ids.shift();
doc.ids.forEach(function (dupId) {
duplicatesIds.push(dupId);
})
});
printjson(duplicatesIds);
db.test.remove({_id:{$in:duplicatesIds}})
db.test.find().count()

Resources