In mongoDB, how can we get the count of particular key in an array
{
"_id" : ObjectId("52d9212608a224e99676d378"),
"business" : [
{
"name" : "abc",
"rating" : 4.5
},
{
"name" : "pqr"
},
{
"name" : "xyz",
"rating" : 3.6
}
]
}
in the above example, business is an array (with "name" and/or "rating" keys)
How can i get the count of business array with only "rating" key existing ?
Expected output is : 2
Looks like you have to use Aggregation Framework. In particular you need to $unwind your array, then match only elements with rating field included, then $group documents back to original format.
Try something like this:
db.test.aggregate([
{ $match: { /* your query criteria document */ } },
{ $unwind: "$business" },
{ $match: {
"business.rating": { $exists: 1 }
}
},
{ $group: {
_id: "$_id",
business: { $push: "$business" },
business_count: { $sum: 1 }
}
}
])
Result will look like the following:
{
_id: ObjectId("52d9212608a224e99676d378"),
business: [
{ name: "abc", rating: 4.5 },
{ name: "xyz", rating: 3.6 }
],
business_count: 2
}
UPD Looks like OP doesn't want to group results by wrapping document _id field. Unfortunately $group expression must specify _id value, otherwise it fails with exception. But, this value can actually be constant (e.g. plain null or 'foobar') so there will be only one resulting group with collection-wise aggregation.
Related
I want to write a mongodb query that allows me to fetch array objects fields based on the IN/OR condition like in relational databases. If I have the following document in my collection, I want to read all "events.event" in ('user','bot')
{_id: 1,
sender_id:100,
"events" : [
{
"event" : "action",
"timestamp" : 1619463803.7244627,
"name" : "abc1"
},
{
"event" : "user",
"timestamp" : 1619463803.7244627,
"name" : "abc2"
},
{
"event" : "bot",
"timestamp" : 1619463803.7244627,
"name" : "abc3"
}
}
I used the following query but it only works for ONE event at a time. Can this be modified to
consider event = 'user' or event = 'bot' ? And can we also project events.event and events.timestamp
along with this elemMatch; all in one go?
db.conversations.find(
{"events.event": "bot"},
{_id: 0, sender_id:1, events: {$elemMatch: {event: "bot"}}});
you have lots of options
the simplest one that simulates what you have in mind about SQL queries with IN operator
db.putCollectionNameHere.aggregate([
{ $match: { _id: 1 } },
{ $unwind: '$events' },
{ $match: { 'events.event': { $in: ['user', 'bot'] } } },
]);
about $unwind documentation says
$unwind
Deconstructs an array field from the input documents to output a
document for each element. Each output document is the input document
with the value of the array field replaced by the element.
I need to check if an ObjectId exists in a non nested array and in multiple nested arrays, I've managed to get very close using the aggregation framework, but got stuck in the very last step.
My documents have this structure:
{
"_id" : ObjectId("605ce5f063b1c2eb384c2b7f"),
"name" : "Test",
"attrs" : [
ObjectId("6058e94c3994d04d28639616"),
ObjectId("6058e94c3994d04d28639627"),
ObjectId("6058e94c3994d04d28639622"),
ObjectId("6058e94c3994d04d2863962e")
],
"variations" : [
{
"varName" : "Var1",
"attrs" : [
ObjectId("6058e94c3994d04d28639616"),
ObjectId("6058e94c3994d04d28639627"),
ObjectId("6058e94c3994d04d28639622"),
ObjectId("60591791d4d41d0a6817d23f")
],
},
{
"varName" : "Var2",
"attrs" : [
ObjectId("60591791d4d41d0a6817d22a"),
ObjectId("60591791d4d41d0a6817d255"),
ObjectId("6058e94c3994d04d28639622"),
ObjectId("60591791d4d41d0a6817d23f")
],
},
],
"storeId" : "9acdq9zgke49pw85"
}
Let´s say I need to check if this if this _id exists "6058e94c3994d04d28639616" in all arrays named attrs.
My aggregation query goes like this:
db.product.aggregate([
{
$match: {
storeId,
},
},
{
$project: {
_id: 0,
attrs: 1,
'variations.attrs': 1,
},
},
{
$project: {
attrs: 1,
vars: '$variations.attrs',
},
},
{
$unwind: '$vars',
},
{
$project: {
attr: {
$concatArrays: ['$vars', '$attrs'],
},
},
},
]);
which results in this:
[
{
attr: [
6058e94c3994d04d28639616,
6058e94c3994d04d28639627,
6058e94c3994d04d28639622,
6058e94c3994d04d2863962e,
6058e94c3994d04d28639616,
6058e94c3994d04d28639627,
6058e94c3994d04d28639622,
60591791d4d41d0a6817d23f,
60591791d4d41d0a6817d22a,
60591791d4d41d0a6817d255,
6058e94c3994d04d28639622,
60591791d4d41d0a6817d23f
]
},
{
attr: [
60591791d4d41d0a6817d22a,
60591791d4d41d0a6817d255,
6058e94c3994d04d28639622,
60591791d4d41d0a6817d23f,
6058e94c3994d04d28639624,
6058e94c3994d04d28639627,
6058e94c3994d04d28639628,
6058e94c3994d04d2863963e
]
}
]
Assuming I have two products in my DB, I get this result. Each element in the outermost array is a different product.
The last bit, which is checking for this key "6058e94c3994d04d28639616", I could not find a way to do it with $group, since I dont have keys to group on.
Or with $match, adding this to the end of the aggregation:
{
$match: {
attr: "6058e94c3994d04d28639616",
},
},
But that results in an empty array. I know that $match does not query arrays like this, but could not find a way to do it with $in as well.
Is this too complicated of a Schema? I cannot have the original data embedded, since it is mutable and I would not be happy to change all products if something changed.
Will this be very expensive if I had like 10000 products?
Thanks in advance
You are trying to compare string 6058e94c3994d04d28639616 with ObjectId. Convert the string to ObjectId using $toObjectId operator when perform $match operation like this:
{
$match: {
$expr: {
$in: [{ $toObjectId: "6058e94c3994d04d28639616" }, "$attr"]
}
}
}
I'm currently trying to massage out counts from the mLab API for reasons I don't have control over. So I want to grab the data I need from there in one query so I can limit the amount of API calls.
Assuming that my data looks like this:
{
"_id": {
"$oid": "12345"
},
"dancer": "Beginner",
"pirate": "Advanced",
"chef": "Mid",
"beartamer": "Mid",
"swordsman": "Mid",
"total": "Mid"
}
I know I can do 6 queries with something similar to:
db.score.aggregate({"$group": { _id: {"total":"$total"}, count: {$sum:1} }} )
but how do I query to get the count for each key? I'd like to see something akin to:
{ "_id" : { "total" : "Advanced" }, "count" : 1 }
{ "_id" : { "total" : "Mid" }, "count" : 1 }
{ "_id" : { "total" : "Beginner" }, "count" : 4 }
{ "_id" : { "pirate" : "Advanced" }, "count" : 1 }
//...etc
The following should give you precisely what you want:
db.scores.aggregate({
$project: {
"_id": 0 // get rid of the "_id" field since we do not want to count it
}
}, {
$project: {
"doc": {
$objectToArray: "$$ROOT" // transform all documents into key-value pairs
}
}
}, {
$unwind: "$doc" // flatten the resulting array into separate documents
}, {
$group: {
"_id": "$doc", // group by distinct key-value combination
"count": { $sum: 1 } // count documents per bucket
}
}, {
$project: {
"_id": { // some more transformation magic to recreate the desired output structure
$mergeObjects: [
{ $arrayToObject: [ [ "$_id" ] ] },
{ "count": "$count" }
]
},
}
}, {
$replaceRoot: {
"newRoot": "$_id" // this moves the contents of the "_id" field to the root of the documents
}
})
tldr; I'm struggling to construct a query to
Make an aggregation to get a count of values on a certain key ("original_text_source"), which
Is in a sub-document that is in an array
Full description
I have embedded documents with arrays that are structured like this:
{
"_id" : ObjectId("0123456789"),
"type" : "some_object",
"relationships" : {
"x" : [ ObjectId("0123456789") ],
"y" : [ ObjectId("0123456789") ],
},
"properties" : [
{
"a" : "1"
},
{
"b" : "1"
},
{
"original_text_source" : "foo.txt"
},
]
}
The docs were created from exactly 10k text files, sorted in various folders. During inserting documents into the MongoDB (in batches) I messed up and moved a few files around, causing one file to be imported twice (my database has a count of exactly 10001 docs), but obviously I don't know which one it is. Since one of the "original_text_source" values has to have a count of 2, I was planning on just deleting one.
I read up on solutions with $elemMatch, but since my array element is a document, I'm not sure how to proceed. Maybe with mapReduce? But I can't transfer the logic to my doc structure.
I also could just create a new collection and reupload all, but in case I mess up again, I'd rather like to learn how to query for duplicates. It seems more elegant :-)
You can find duplicates with a simple aggregation like this:
db.collection.aggregate(
{ $group: { _id: "$properties.original_text_source", docIds: { $push: "$_id" }, docCount: { $sum: 1 } } },
{ $match: { "docCount": { $gt: 1 } } }
)
which gives you something like this:
{
"_id" : [
"foo.txt"
],
"docIds" : [
ObjectId("59d6323613940a78ba1d5ffa"),
ObjectId("59d6324213940a78ba1d5ffc")
],
"docCount" : 2.0
}
Run the following:
db.collection.aggregate([
{ $group: {
_id: { name: "$properties.original_text_source" },
idsForDuplicatedDocs: { $addToSet: "$_id" },
count: { $sum: 1 }
} },
{ $match: {
count: { $gte: 2 }
} },
{ $sort : { count : -1} }
]);
Given a collection which contains two copies of the document you showed in your question, the above command will return:
{
"_id" : {
"name" : [
"foo.txt"
]
},
"idsForDuplicatedDocs" : [
ObjectId("59d631d2c26584cd8b7b3337"),
ObjectId("59d631cbc26584cd8b7b3333")
],
"count" : 2
}
Where ...
The attribute _id.name is the value of the duplicated properties.original_text_source
The attribute idsForDuplicatedDocs contains the _id values for each of the documents which have a duplicated properties.original_text_source
"reviewAndRating": [
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7517",
"_id": "5bd9745e2d66162a6dd1f0ef",
"rating": "5"
},
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7518",
"_id": "5bd974612d66162a6dd1f0f0",
"rating": "5"
},
{
"review": "aksjdhfkashdfkashfdkjashjdkfhasdkjfhsafkjhasdkjfhasdjkfhsdakfj",
"productId": "5bd956f29fcaca161f6b7517",
"_id": "5bd974622d66162a6dd1f0f1",
"rating": "5"
}
]
Is it possible to return records based on the size of the array after $elemMatch has filtered it down?
For example, if I have many records in a collection like the following:
[
{
contents: [
{
name: "yorkie",
},
{
name: "dairy milk",
},
{
name: "yorkie",
},
]
},
// ...
]
And I wanted to find all records in which their contents field contained 2 array items with their name field equal to "yorkie", how would I do this? To clarify, the array could contain other items, but the criteria is met so long as 2 of those array items have the matching field:value.
I'm aware I can use $elemMatch (or contents.name) to return records where the array contains at least one item matching that name, and I'm aware I can also use $size to filter based on the exact number of array items in the record's field. Is there a way that they can be both combined?
Not in a find query, but it can be done with an aggregation:
db.test.aggregate([
{ "$match" : { "contents.name" : "yorkie" } },
{ "$unwind" : "$contents" },
{ "$match" : { "contents.name" : "yorkie" } },
{ "$group" : { "_id" : "$_id", "sz" : { "$sum" : 1 } } }, // use $first to include other fields
{ "$match" : { "sz" : { "$gte" : 2 } } }
])
I interpreted
the criteria is met so long as 2 of those array items have the matching field:value
as meaning the criteria is met if at least 2 array items have the matching value in name.
I know this thread is old, but today you can just use find
db.test.find({
"$expr": {
"$gt": [
{
"$reduce": {
"input": "$contents",
"initialValue": 0,
"in": {
"$cond": {
"if": {
"$eq": ["$$this.name", 'yorkie']
},
"then": {
"$add": ["$$value", 1]
},
"else": "$$value"
}
}
}
},
1
]
}
})
The reduce will do the trick here, and will return the number of objects that match the criteria