Hi I know there has been much said regarding this but I'm unable to find an answer to my specific problem. I have the following JSON document and trying to create an efficient index for the questions.questionEntry.metaTags array:
{
"questions": [
{
"questionEntry": {
"id": 1,
"info": {
"seasonNumber": 1,
"episodeNumber": 1,
"episodeName": "Days Gone Bye"
},
"questionItem": {
"theQuestion": "",
"attachedElement": {
"type": 1,
"value": ""
}
},
"options": [
{
"type": 1,
"value": ""
},
{
"type": 1,
"value": ""
}
],
"answer": {
"questionId": 1,
"answer": 1
},
"metaTags": [
"Season 1",
"Episode 1"
]
}
}
]
}
I then added 5000,000 duplicate documents to my DB and an additional document with different data fields to run some tests.
I ran the following query on the unindexed collection with an execution time of 640ms:
db.questions1.find({"questions.questionEntry.metaTags" : "Season 1"},{'questions.$':1})._addSpecial( "$explain", 1 ).pretty()
Then I created the following index:
db.questions1.createIndex( { "questions.questionEntry.metaTags" : 1 })
Now I ran the same query but now the execution time is 9070ms...!
Here is the explain() showing 500001 documents examined!:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.questions1",
"indexFilterSet" : false,
"parsedQuery" : {
"questions.questionEntry.metaTags" : {
"$eq" : "Season 1"
}
},
"winningPlan" : {
"stage" : "PROJECTION",
"transformBy" : {
"questions.$" : 1
},
"inputStage" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"questions.questionEntry.metaTa
s" : 1
},
"indexName" : "questions.questionEntry.
etaTags_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"questions.questionEntry.metaTa
s" : [
"[\"Season 1\", \"Seaso
1\"]"
]
}
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 500001,
"executionTimeMillis" : 11255,
"totalKeysExamined" : 500001,
"totalDocsExamined" : 500001,
"executionStages" : {
"stage" : "PROJECTION",
"nReturned" : 500001,
"executionTimeMillisEstimate" : 10750,
"works" : 500002,
"advanced" : 500001,
"needTime" : 0,
"needFetch" : 0,
"saveState" : 3907,
"restoreState" : 3907,
"isEOF" : 1,
"invalidates" : 0,
"transformBy" : {
"questions.$" : 1
},
"inputStage" : {
"stage" : "FETCH",
"nReturned" : 500001,
"executionTimeMillisEstimate" : 9310,
"works" : 500002,
"advanced" : 500001,
"needTime" : 0,
"needFetch" : 0,
"saveState" : 3907,
"restoreState" : 3907,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 500001,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 500001,
"executionTimeMillisEstimate" : 8970,
"works" : 500001,
"advanced" : 500001,
"needTime" : 0,
"needFetch" : 0,
"saveState" : 3907,
"restoreState" : 3907,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"questions.questionEntry.metaTa
s" : 1
},
"indexName" : "questions.questionEntry.
etaTags_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"questions.questionEntry.metaTa
s" : [
"[\"Season 1\", \"Seaso
1\"]"
]
},
"keysExamined" : 500001,
"dupsTested" : 500001,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0
}
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "Voltage",
"port" : 27017,
"version" : "3.0.3",
"gitVersion" : "b40106b36eecd1b4407eb1ad1af6bc60593c6105"
}
}
Mongo db is not my thing and I'm struggling to understand why execution is taking longer?
What would be the best method to index the string metaTags array?
Many thanks
Related
Let's assume my collection user is like this:
[
{
"name":"user1",
"u_id":"1",
"is_happy":true,
"like_baseball":false,
"is_strong":true,
},
{
"name":"user2",
"u_id":"2",
"is_happy":false,
"like_baseball":false,
"is_strong":true,
},
{
"name":"user3",
"u_id":"3",
"is_happy":true,
"like_baseball":false,
"is_strong":false,
},
...
]
There are 1m documents in this collection.
I create two indexes:
1.
{
"is_happy": 1,
"like_baseball": 1,
"is_strong": 1,
}
2.
{
"u_id": 1,
}
We all know the first index cannot help speed up the query below since the selectivity of it is bad:
db.user.find({
is_happy: true,
like_baseball: true,
is_strong: false,
})
MongoDB document provides two ways to deal with bad selectivity
Separate one collection into two collections. (In my case, separate happy and unhappy humans into two collections.) However, I have three boolean fields, which make the separating task hard.
Create a compound index of the field and other fields with a lot of value. (In my case, I can create a compound index of three boolean fields and u_id.) However, this means I have to include the u_id in all queries, which I cannot guarantee.
Since both ways are not suitable for me, I am wondering if there is another way to speed up the query. Thank you all! :)
Sounds like this might be a good use for the attribute pattern. See https://www.mongodb.com/blog/post/building-with-patterns-the-attribute-pattern for details...
By the way, separating into two collections will likely not provide the performance improvements you seek.
if you have the following index:
{
"is_happy": 1,
"like_baseball": 1,
"is_strong": 1,
}
and issue the following query...
db.baseball.find({ is_happy: true, like_baseball: true, is_strong: false })
running an explain plan shows a good ratio (1:1) between Keys Examined and nReturned.
db.baseball.find({ is_happy: true, like_baseball: true, is_strong: false }).explain("allPlansExecution")
All Plans Execution Results:
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "barrystuff.baseball",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"is_happy" : {
"$eq" : true
}
},
{
"is_strong" : {
"$eq" : false
}
},
{
"like_baseball" : {
"$eq" : true
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"is_happy" : 1,
"like_baseball" : 1,
"is_strong" : 1
},
"indexName" : "is_happy_1_like_baseball_1_is_strong_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"is_happy" : [ ],
"like_baseball" : [ ],
"is_strong" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"is_happy" : [
"[true, true]"
],
"like_baseball" : [
"[true, true]"
],
"is_strong" : [
"[false, false]"
]
}
}
},
"rejectedPlans" : [ ]
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 3,
"executionTimeMillis" : 0,
"totalKeysExamined" : 3,
"totalDocsExamined" : 3,
"executionStages" : {
"stage" : "FETCH",
"nReturned" : 3,
"executionTimeMillisEstimate" : 0,
"works" : 4,
"advanced" : 3,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 3,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 3,
"executionTimeMillisEstimate" : 0,
"works" : 4,
"advanced" : 3,
"needTime" : 0,
"needYield" : 0,
"saveState" : 0,
"restoreState" : 0,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"is_happy" : 1,
"like_baseball" : 1,
"is_strong" : 1
},
"indexName" : "is_happy_1_like_baseball_1_is_strong_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"is_happy" : [ ],
"like_baseball" : [ ],
"is_strong" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"is_happy" : [
"[true, true]"
],
"like_baseball" : [
"[true, true]"
],
"is_strong" : [
"[false, false]"
]
},
"keysExamined" : 3,
"seeks" : 1,
"dupsTested" : 0,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
},
"allPlansExecution" : [ ]
},
"serverInfo" : {
"host" : "Barry-MacBook-Pro.local",
"port" : 27017,
"version" : "4.0.6",
"gitVersion" : "caa42a1f75a56c7643d0b68d3880444375ec42e3"
},
"ok" : 1
}
While selectivity might be bad it is the nature of the data, and the query still requires the results. If you need this query and you want better performance you may want to consider vertical scaling first, then if that still does not meet your needs consider horizontal scaling.
If the data model is stable and the field names used are consistent you might be able to use a covered query for your needs. I suspect your real-world need is not as trivial as the example provided.
I have added created a collection first and created index;
db.first.createIndex({a:1, b:1, c:1, d:1, e:1, f:1});
then inserted data
db.first.insert({a:1, b:2, c:3, d:4, e:5, f:6});
db.first.insert({a:1, b:6});
When making queries like
db.first.find({f: 6, a:1, c:3}).sort({b: -1}).explain();
indexes are used (IXSCAN)
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "myproject.first",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"a" : {
"$eq" : 1
}
},
{
"c" : {
"$eq" : 3
}
},
{
"f" : {
"$eq" : 6
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"a" : 1,
"b" : 1,
"c" : 1,
"d" : 1,
"e" : 1,
"f" : 1
},
"indexName" : "a_1_b_1_c_1_d_1_e_1_f_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"a" : [ ],
"b" : [ ],
"c" : [ ],
"d" : [ ],
"e" : [ ],
"f" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "backward",
"indexBounds" : {
"a" : [
"[1.0, 1.0]"
],
"b" : [
"[MaxKey, MinKey]"
],
"c" : [
"[3.0, 3.0]"
],
"d" : [
"[MaxKey, MinKey]"
],
"e" : [
"[MaxKey, MinKey]"
],
"f" : [
"[6.0, 6.0]"
]
}
}
},
"rejectedPlans" : [
{
"stage" : "SORT",
"sortPattern" : {
"b" : -1
},
"inputStage" : {
"stage" : "SORT_KEY_GENERATOR",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"$and" : [
{
"c" : {
"$eq" : 3
}
},
{
"f" : {
"$eq" : 6
}
}
]
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"a" : 1
},
"indexName" : "a_1",
"isMultiKey" : false,
"multiKeyPaths" : {
"a" : [ ]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"a" : [
"[1.0, 1.0]"
]
}
}
}
}
}
]
},
"serverInfo" : {
"host" : "Manishs-MacBook-Pro.local",
"port" : 27017,
"version" : "3.6.4",
"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
},
"ok" : 1
}
but when I use or query
db.first.find({ $or: [{f: 6}, {a:1}]}).explain();
index is not used instead columns are scanned (COLLSCAN)
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "myproject.first",
"indexFilterSet" : false,
"parsedQuery" : {
"$or" : [
{
"a" : {
"$eq" : 1
}
},
{
"f" : {
"$eq" : 6
}
}
]
},
"winningPlan" : {
"stage" : "SUBPLAN",
"inputStage" : {
"stage" : "COLLSCAN",
"filter" : {
"$or" : [
{
"a" : {
"$eq" : 1
}
},
{
"f" : {
"$eq" : 6
}
}
]
},
"direction" : "forward"
}
},
"rejectedPlans" : [ ]
},
"serverInfo" : {
"host" : "Manishs-MacBook-Pro.local",
"port" : 27017,
"version" : "3.6.4",
"gitVersion" : "d0181a711f7e7f39e60b5aeb1dc7097bf6ae5856"
},
"ok" : 1
}
Please let me know if I am doing something wrong.
The fact that you have a compound index is the cause for indexes not being used with $or.
When evaluating the clauses in the $or expression, MongoDB either
performs a collection scan or, if all the clauses are supported by
indexes, MongoDB performs index scans. That is, for MongoDB to use
indexes to evaluate an $or expression, all the clauses in the $or
expression must be supported by indexes. Otherwise, MongoDB will
perform a collection scan.
When using indexes with $or queries, each clause of an $or can use its
own index. Consider the following query:
db.inventory.find( { $or: [ { quantity: { $lt: 20 } }, { price: 10 } ] } )
To support this query, rather than a compound index, you would create
one index on quantity and another index on price:
db.inventory.createIndex( { quantity: 1 } )
db.inventory.createIndex( { price: 1 } )
$or Clauses and Indexes
So just by adding individual indexing for fields f and a like;
db.first.createIndex({a:1});
db.first.createIndex({f:1});
will make your
db.first.find({ $or: [{f: 6}, {a:1}]})
query to use indexing.
The issue here is, you've created a compound index on {a:1, b:1, c:1, d:1, e:1, f:1} fields but you're not following the order of the index. So your queries should contain all the fields in the same order that you've constructed your index. Since the field 'f' is in the tail end of the index, your queries will not utilize or even identify it
Your queries:
db.first.find({f: 6, a:1, c:3}).sort({b: -1})
db.first.find({ $or: [{f: 6}, {a:1}]})
To make both your above queries use the index, you should build the compound index as below:
db.first.createIndex({ f:1, a:1, b:1, c:1 })
OR: you can build individual indexes on all fields and use it in any order in your query.
Remember: If you're building compound index, make sure to follow the Equality, Sort and Range order
I am trying to fetch few documents in a collection, by making a find query on array of nested objects. Nested objects are indexed but find query is not using the index to fetch documents.
Here is the structure of a document.
"_id" : ObjectId("5bc6498c1ec4062983c4f4ef"),
"appId" : ObjectId("5bbc775036021bea06d9bbc2"),
"status" : "active",
"segmentations" : [
{
"name" : "ch-1",
"values" : [
'true'
],
"type" : "string"
},
{
"name" : "browerInfo",
"values" : [
"Firefox"
],
"version" : [
"62.0"
],
"majorVersion" : [
"62"
],
"type" : "string"
},
{
"name" : "OS",
"values" : [
"Ubuntu"
],
"type" : "string"
},
{
"name" : "lastVisitTime",
"values" : [
1539721615231.0
],
"type" : "number"
}
]
}
Here are the index fields.
{
"v" : 2,
"key" : {
"appId" : 1,
"status" : 1,
"segmentations.name" : 1,
"segmentations.values" : 1
},
"name" : "SEGMENT_INDEX",
"ns" : "test.Collname"
}
below is the find find query i was executing
db.Collname.find({
appId: ObjectId("5c6a8ef544ff62c73bdb98fc"),
"segmentations.name": 'ch-1',
'segmentations.values': 'true',
status: 'active'
}, {})
I tried to get the query execution information using
<above query>.explain("executionStats")
The result is
{
"queryPlanner" : {
"plannerVersion" : 1,
"namespace" : "test.Collname",
"indexFilterSet" : false,
"parsedQuery" : {
"$and" : [
{
"appId" : {
"$eq" : ObjectId("5c6a8ef544ff62c73bdb98fc")
}
},
{
"segmentations.name" : {
"$eq" : "ch-1"
}
},
{
"segmentations.values" : {
"$eq" : "true"
}
},
{
"status" : {
"$eq" : "active"
}
}
]
},
"winningPlan" : {
"stage" : "FETCH",
"filter" : {
"segmentations.values" : {
"$eq" : "true"
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"appId" : 1.0,
"status" : 1.0,
"segmentations.name" : 1.0,
"segmentations.values" : 1.0
},
"indexName" : "SEGMENT_INDEX",
"isMultiKey" : true,
"multiKeyPaths" : {
"appId" : [],
"status" : [],
"segmentations.name" : [
"segmentations"
],
"segmentations.values" : [
"segmentations",
"segmentations.values"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"appId" : [
"[ObjectId('5c6a8ef544ff62c73bdb98fc'), ObjectId('5c6a8ef544ff62c73bdb98fc')]"
],
"status" : [
"[\"active\", \"active\"]"
],
"segmentations.name" : [
"[\"ch-1\", \"ch-1\"]"
],
"segmentations.values" : [
"[MinKey, MaxKey]"
]
}
}
},
"rejectedPlans" : []
},
"executionStats" : {
"executionSuccess" : true,
"nReturned" : 28176,
"executionTimeMillis" : 72,
"totalKeysExamined" : 28176,
"totalDocsExamined" : 28176,
"executionStages" : {
"stage" : "FETCH",
"filter" : {
"segmentations.values" : {
"$eq" : "true"
}
},
"nReturned" : 28176,
"executionTimeMillisEstimate" : 70,
"works" : 28177,
"advanced" : 28176,
"needTime" : 0,
"needYield" : 0,
"saveState" : 220,
"restoreState" : 220,
"isEOF" : 1,
"invalidates" : 0,
"docsExamined" : 28176,
"alreadyHasObj" : 0,
"inputStage" : {
"stage" : "IXSCAN",
"nReturned" : 28176,
"executionTimeMillisEstimate" : 10,
"works" : 28177,
"advanced" : 28176,
"needTime" : 0,
"needYield" : 0,
"saveState" : 220,
"restoreState" : 220,
"isEOF" : 1,
"invalidates" : 0,
"keyPattern" : {
"appId" : 1.0,
"status" : 1.0,
"segmentations.name" : 1.0,
"segmentations.values" : 1.0
},
"indexName" : "SEGMENT_INDEX",
"isMultiKey" : true,
"multiKeyPaths" : {
"appId" : [],
"status" : [],
"segmentations.name" : [
"segmentations"
],
"segmentations.values" : [
"segmentations",
"segmentations.values"
]
},
"isUnique" : false,
"isSparse" : false,
"isPartial" : false,
"indexVersion" : 2,
"direction" : "forward",
"indexBounds" : {
"appId" : [
"[ObjectId('5c6a8ef544ff62c73bdb98fc'), ObjectId('5c6a8ef544ff62c73bdb98fc')]"
],
"status" : [
"[\"active\", \"active\"]"
],
"segmentations.name" : [
"[\"ch-1\", \"ch-1\"]"
],
"segmentations.values" : [
"[MinKey, MaxKey]"
]
},
"keysExamined" : 28176,
"seeks" : 1,
"dupsTested" : 28176,
"dupsDropped" : 0,
"seenInvalidated" : 0
}
}
},
"serverInfo" : {
"host" : "sys3029",
"port" : 27017,
"version" : "4.0.9",
"gitVersion" : "fc525e2d9b0e4bceff5c2201457e564362909765"
},
"ok" : 1.0
}
I could see from executionStats that "segmentations.values" field is not used in "IXSCAN" stage. And there is an extra filter stage on "segmentations.values". IXSCAN stage took just 10ms, where as FILTER stage took 50ms.
I couldn't understand why the field is not included in IXSCAN stage. My collection has around 3.2 Million documents and because of this issue query execution time is very high than expected.
Please help me fix the issue.
Thank you in advance.
Please suggest me If I need to change my database structure,
If it is not possible in mongodb,you can suggest some other database which supports above operations.
The following query will use your index for both of your array fields:
.find({
appId: ObjectId("5c6a8ef544ff62c73bdb98fc"),
segmentations:{$elemMatch:{name: 'ch-1',values: 'true'}},
status: 'active'
}, {})
If you are not using $elemMatch, MongoDB can compound the bounds for the array item keys with either the bounds for "segmentations.name" or the bounds for "segmentations.values", but not both.
In order to compound the bounds for "segmentations.name" with the bounds for "segmentations.values", the query must use $elemMatch.
To compound together the bounds for index keys from the same array:
the index keys must share the same field path up to but excluding the
field names,
and the query must specify predicates on the fields
using $elemMatch on that path.
I suggest you to read mongodb docs about multikey-index-bounds and also about $elemMatch.
If I have index on element in array and "normal" element, but if they are in nested object, the second part of the index does not seem to be used.
Example of data:
db.col1.insert(
{rt:{
a:"1",
b: [{c:"1"}, {d:"2"}]
}})
Index:
db.col1.ensureIndex({"rt.b.c":1, "rt.a":1})
Query:
db.col1.find({"rt.a":1, "rt.b.c":1}).explain()
"winningPlan" : {
"stage" : "KEEP_MUTATIONS",
"inputStage" : {
"stage" : "FETCH",
"filter" : {
"rt.a" : {
"$eq" : 1
}
},
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"rt.b.c" : 1,
"rt.a" : 1
},
"indexName" : "rt.b.c_1_rt.a_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"rt.b.c" : [
"[1.0, 1.0]"
],
"rt.a" : [
"[MinKey, MaxKey]"
]
}
}
}
So rt.a part of the index does not seem to be used, and I don't see why. If I do the same with root elements, both part of the indexes are used:
> db.col1.insert({a:"1", b: [{c:"1"}, {d:"2"}]})
> db.col1.ensureIndex({"b.c":1, "a":1})
> db.col1.find({"a":1, "b.c":1}).explain()
{
"winningPlan" : {
"stage" : "FETCH",
"inputStage" : {
"stage" : "IXSCAN",
"keyPattern" : {
"b.c" : 1,
"a" : 1
},
"indexName" : "b.c_1_a_1",
"isMultiKey" : true,
"direction" : "forward",
"indexBounds" : {
"b.c" : [
"[1.0, 1.0]"
],
"a" : [
"[1.0, 1.0]"
]
}
}
I am aware that you can use $elemMatch if two objects are in the same array, but in my example they are not and should not be effected by that MongoDB behavior.
I am indexing three fields on a collection, one of which is an array. I am running a query on these three fields and the query takes more than a second with 300K fields on the collection. When I call explain on the query, I see that my index is being used correctly, but the number of scannedObjects is very high. I guess this is the reason behind the low performance.
{
"_id" : ObjectId("54c8f110389a46153866d82e"),
"mmt" : [
"54944cfd90671810ccbf2552",
"54c64029038d8c3aff41ad6d",
"54c64029038d8c3aff41ad73",
"54c8f151038d8c3aff453669",
"54c8f151038d8c3aff45366d"
],
"p" : 8700,
"sui" : "3810d5cf-3032-4a77-9715-a42e010e569c"
/* also some more fields */
}
With this index:
{
"sui" : 1,
"p" : 1,
"mmt" : 1
}
I am trying to run this query:
db.my_coll.find(
{
"mmt" : { "$all" :
[
"54944cfd90671810ccbf2552", "54ac1db0e3f494afd4ded4c8", "54ac1db1e3f494afd4ded66a", "54ac1db1e3f494afd4ded66b", "54c8b671038d8c3aff453649", "54c8f154038d8c3aff45368f", "54c8f154038d8c3aff453694"
]
},
"sui" : { "$ne" : "bde0f517-b942-4823-b2c8-a41900f46641" },
"p": { $gt: 100, $lt: 1000 }
}
).limit(1000).explain()
The result of the explain is:
{
"cursor" : "BtreeCursor sui_1_p_1_mmt_1",
"isMultiKey" : true,
"n" : 16,
"nscannedObjects" : 14356,
"nscanned" : 129223,
"nscannedObjectsAllPlans" : 14356,
"nscannedAllPlans" : 129223,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 1009,
"nChunkSkips" : 0,
"millis" : 1276,
"indexBounds" : {
"sui" : [
[
{
"$minElement" : 1
},
"bde0f517-b942-4823-b2c8-a41900f46641"
],
[
"bde0f517-b942-4823-b2c8-a41900f46641",
{
"$maxElement" : 1
}
]
],
"p" : [
[
-Infinity,
1000
]
],
"mmt" : [
[
"54944cfd90671810ccbf2552",
"54944cfd90671810ccbf2552"
]
]
},
"server" : "shopkrowdMongo:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 129224,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 16,
"needTime" : 129207,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 129224,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 16,
"needTime" : 129207,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 129224,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 16,
"needTime" : 129207,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 16,
"children" : [
{
"type" : "IXSCAN",
"works" : 129223,
"yields" : 1009,
"unyields" : 1009,
"invalidates" : 0,
"advanced" : 14356,
"needTime" : 114867,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ sui: 1.0, p: 1.0, mmt: 1.0 }",
"isMultiKey" : 1,
"boundsVerbose" : "field #0['sui']: [MinKey, \"bde0f517-b942-4823-b2c8-a41900f46641\"), (\"bde0f517-b942-4823-b2c8-a41900f46641\", MaxKey], field #1['p']: [-inf.0, 1000.0), field #2['mmt']: [\"54944cfd90671810ccbf2552\", \"54944cfd90671810ccbf2552\"]",
"yieldMovedCursor" : 0,
"dupsTested" : 14356,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 129223,
"children" : []
}
]
}
]
}
]
}
}
The number of items found is 16 but the number of scannedObjects is 14356. I do not understand why mongodb scans so much documents even though all the fields of the query are indexed.
Why is mongodb scanning so much objects?
How can I get the results of this query faster?
The mmt array I am using does not grow or shrink over time, but the number of elements in it varies between 5 - 15. I need to query this field with several combinations of $in, $all and $nin. Number of items in this collection will probably grow over 30M. Is there a way to reliably get fast results for this scenario?
UPDATE 1:
I tried removing sui field and the $ne query. The updated explain:
{
"cursor" : "BtreeCursor p_1_mmt_1",
"isMultiKey" : true,
"n" : 17,
"nscannedObjects" : 16338,
"nscanned" : 16963,
"nscannedObjectsAllPlans" : 16338,
"nscannedAllPlans" : 33930,
"scanAndOrder" : false,
"indexOnly" : false,
"nYields" : 265,
"nChunkSkips" : 0,
"millis" : 230,
"indexBounds" : {
"p" : [
[
-Infinity,
1000
]
],
"mmt" : [
[
"54944cfd90671810ccbf2552",
"54944cfd90671810ccbf2552"
]
]
},
"server" : "shopkrowdMongo:27017",
"filterSet" : false,
"stats" : {
"type" : "LIMIT",
"works" : 16966,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 17,
"needTime" : 16947,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "KEEP_MUTATIONS",
"works" : 16966,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 17,
"needTime" : 16947,
"needFetch" : 0,
"isEOF" : 1,
"children" : [
{
"type" : "FETCH",
"works" : 16965,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 17,
"needTime" : 16947,
"needFetch" : 0,
"isEOF" : 1,
"alreadyHasObj" : 0,
"forcedFetches" : 0,
"matchTested" : 17,
"children" : [
{
"type" : "IXSCAN",
"works" : 16964,
"yields" : 265,
"unyields" : 265,
"invalidates" : 0,
"advanced" : 16338,
"needTime" : 626,
"needFetch" : 0,
"isEOF" : 1,
"keyPattern" : "{ p: 1.0, mmt: 1.0 }",
"isMultiKey" : 1,
"boundsVerbose" : "field #0['p']: [-inf.0, 1000.0), field #1['mmt']: [\"54944cfd90671810ccbf2552\", \"54944cfd90671810ccbf2552\"]",
"yieldMovedCursor" : 0,
"dupsTested" : 16338,
"dupsDropped" : 0,
"seenInvalidated" : 0,
"matchTested" : 0,
"keysExamined" : 16963,
"children" : []
}
]
}
]
}
]
}
}
The query performed better, but scannedObjects is still very high.
I think marcinn was right to single out the $ne as the most likely culprit, but update 1 shows us the $all is also a problem. The query is using the mmt portion of the index to find documents containing one of the values in the array and then must scan the rest of the mmt array to verify that all of the values in the $all array are in the mmt array of a potentially matching document. This means the potentially matching document must be loaded and scanned, so it counts as a scannedObject. To demonstrate this behavior very clearly, consider the following example:
> db.test.drop()
> for (var i = 0; i < 100; i++) db.test.insert({ "x" : [1, 2] })
> for (var i = 0; i < 100; i++) db.test.insert({ "x" : [1, 3] })
> db.test.ensureIndex({ "x" : 1 })
> db.test.find({ "x" : { "$all" : [1, 2] } }).explain(true)
This shows n = 100 and nscanned = nscannedObjects = 200 resulting from using the value 1 as both index bounds, while the logically equivalent query
> db.test.find({ "x" : { "$all" : [2, 1] } }).explain(true)
shows n = nscanned = nscannedObjects = 100 with both index bounds having the value 2.
Basically it is because $ne cannot use indexes (efficiently). So your index is used only because first you query by mnt field and then its reading
Some query operations are not selective. These operations cannot use
indexes effectively or cannot use indexes at all.
The inequality operators $nin and $ne are not very selective, as they
often match a large portion of the index. As a result, in most cases,
a $nin or $ne query with an index may perform no better than a $nin or
$ne query that must scan all documents in a collection
http://docs.mongodb.org/manual/core/query-optimization/