I have a mongodb collection in which document is stored in the below format
{
"_id": {
"host_ip": "192.41.15.161",
"date": "2020-02-02T08:18:09.207Z"
},
"path": "/apache_pb.gif",
"request": "GET /apache_pb.gif HTTP/1.0",
"status": 200
}
where "host_ip" and "date" should be composite primary key i.e. (unique together) and there exists an _id index, which I think is created based on these two fields
So, how can I query based on host_ip and date together so that the "_id" index can be utilized?
Tried using
db.collection.find({_id: {host_ip: "192.41.15.161", date: {$gte: ISODate('2020-02-02T08:00:00:00.000Z')}}}), but it does not work, it does not even return the record which should match. Is it not the correct way to query?
Query like
db.collection.find({"_id.host_ip": "192.41.15.161", "_id.date": {$gte: ISODate('2020-02-02T08:00:00:00.000Z')}}), worked but it does not use the index created on "_id"
When querying for an _id composite primary key, mongo seems to only look for exact matches (essentially treating _id like a string), so when you query for _id: {a: x, $gte: y} it looks for "{a: x, $gte: y}" and doesn't allow querying by any portion of the object.
> db.test.insert({_id: {a: 1, b: 2}});
WriteResult({ "nInserted" : 1 })
> db.test.find({_id: {a: 1}}); // no results
> db.test.find({_id: {a: 1, b: {$eq: 2}}}); // no results
As far as solving your problem goes, I'd probably set up a new unique compound index on the fields you care about & ignore the _id field entirely.
MongoDB and composite primary keys has a few more details.
Related
I have Collection has many documents which called "products",
I want to improve performance by creating an index for it.
The problem is IDK how the index works, So IDK index will helpful or not.
My most frequently used query is about fields "storeId" and "salesDates"
storeId is just string so I think it good to create an index,
But the tricky one is salesDates, salesDates is Object has two fields from and to like this
product {
...someFields,
storeId: string,
salesDate {
from: Date time Number
to: Date time Number
}
}
My query is based on $gt $lt for example
product.find({
storeId: "blah",
salesDate.from : {$gt: 1423151234, $lt: 15123123123 }
})
OR
product.find({
storeId: "blah",
salesDate.from: {$gt: 1423151234},
salesDate.to: {$lt: 15123123123 }
})
What is the proper index for this case?
For your specific use case, I would recommend you to create an index only on
the from key and use $ge and $le in your find query.
The reason is that the lesser the number of keys you are indexing (in cases where the multiple key queries can be avoided), the better it is.
make sure that you follow the below order for both the indexing and find operation.
Index Command and Order:
db,product.createIndex({
"storeId": 1,
"salesDate.from": -1, // Change `-1` to `1` if you want to ISODate key to be indexed in Ascending order
})
Find Command:
db,product.find({
"storeId": "blah",
"salesDate.from": {$gt: 1423151234, $lt: 15123123123 },
})
Use the common definition that:
Alternative 1 Index = Index stores "Whole data record with key value k"
Alternative 2 Index = Index stores "<k, _id of a data record with search key value k>"
Alternative 3 Index = Index stores "<k, list of _ids of data records with search key value k>"
I checked the mongo index readme in https://docs.mongodb.com/manual/indexes/, it looks like Alternative 2, but wanted to confirm.
By default MongoDB creates a unique index on the _id field during the creation of a collection. You can see the default index (_id) and others you have created with the mongo Shell.
db.collection.getIndexes() returns an array of documents that hold
index information for the collection.
[
{
"v" : 2,
"key" : {
"_id" : 1
},
"name" : "_id_"
},
...
]
v: The version of the index.
key: This is an unique index with the
_id field in ascending order.
name: The name of the index.
I'm trying to create unique index for array field in document. This index should works like when I have one document with array which contain two elements, then if I want to add a new document where array field if contain these two elements then should happen duplicate error - but not in situation when only one of elements is duplicated in another array.
Maybe I'll show the example what I mean:
First I create a simple document:
{
"name" : "Just a name",
"users" : [
"user1",
"user2"
]
}
And I want to create unique index on 'users' array field. The result of what I want is to make it possible to create another documents like this:
{
"name" : "Just a name",
"users" : [
"user1",
"user3"
]
}
or
{
"name" : "Just a name",
"users" : [
"user2",
"user5"
]
}
BUT it should be impossible to create second:
{
"name" : "Just a name",
"users" : [
"user1",
"user2"
]
}
Or reversed:
{
"name" : "Just a name",
"users" : [
"user2",
"user1"
]
}
But this is impossible because Mongo give me a error that "users1" is duplicated.
Is it possible to create unique index on all array elements as shown above?
As per the Mongo official documentation
For unique indexes, the unique constraint applies across separate documents in the collection rather than within a single document.
Because the unique constraint applies to separate documents, for a unique multikey index, a document may have array elements that result in repeating index key values as long as the index key values for that document do not duplicate those of another document.
So you can't insert the second documents as
{
"name" : "Just a name",
"users" : [
"user1",
"user3"
]
}
You will get the duplicate error of unique constraint:
> db.st1.insert({"name":"Just a name","users":["user1","user3"]})
WriteResult({
"nInserted" : 0,
"writeError" : {
"code" : 11000,
"errmsg" : "E11000 duplicate key error collection: test.st1 index: users_1 dup key: { : \"user1\" }"
}
})
Since users user1 already exist the users index for the the fist documents.
Currently you have only solution to manage it through your code from where you are inserting into the collection. Before save or update make a validation logic and make sure the constraint you want to impose.
I have a very similar problem and sadly it seems it's not possible. Not because unique constraint applies only across separate documents, but because:
To index a field that holds an array value, MongoDB creates an index key for each element in the array
i.e. each of the individual array elements has to be unique across all other documents.
I have a database with documents like these:
{_id: "1", module:["m1"]}
{_id: "2", module:["m1", "m2"]}
{_id: "3", module:["m3"]}
There is an search index created for these documents with the following index function:
function (doc) {
doc.module && doc.module.forEach &&
doc.module.forEach(function(module){
index("module", module, {"store":true, "facet": true});
});
}
The index uses "keyword" analyzer on module field.
The sample data is quite small (11 documents, 3 different module values)
I have two issues with queries that are using group_field=module parameter:
Not all groups are returned. I get 2 out of 3 groups that I expect. Seems like if a document with ["m1", "m2"] is returned in the "m1" group, but there is no "m2" group. When I use counts=["modules"] I get complete lists of distinct values.
I'd like to be able to get something like:
{
"total_rows": 3,
"groups": [
{ "by": "m1",
"total_rows": 1,
"rows": [ {_id: "1", module: "m1"},
{_id: "2", module: "m2"}
]
},
{ "by": "m2",
"total_rows": 1,
"rows": [ {_id: "2", module: "m2"} ]
},
....
]
}
When using group_field, bookmark is not returned, so there is no way to get the next chunk of the data beyond 200 groups or 200 rows in a group.
Cloudant Search is based on Apache Lucene, and hence has its properties/limitations.
One limitation of grouping is that "the group field must be a single-valued indexed field" (Lucene Grouping), hence a document can be only in one group.
Another limitation/property of grouping is that topNGroups and maxDocsPerGroup need to be provided in advance, and in Cloudant case the max numbers are 200 and 200 (they can be set lower by using group_limit and limit parameters).
I created a JSON index in cloudant on _id like so:
{
"index": {
"fields": [ "_id"]
},
"ddoc": "mydesigndoc",
"type": "json",
"name": "myindex"
}
First off, unless I specified the index name, somehow cloudant could not differentiate between the index I created and the default text based index for _id (if that is truly the case, then this is a bug I believe)
I ran the following query against the _find endpoint of my db:
{
"selector": {
"_id": {
"$nin":["v1","v2"]
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
The result was this error:
{"error":"no_usable_index","reason":"There is no index available for this selector."}
if I change "$nin":["v1","v2"] to "$eq":"v1" then it works fine, but that is not the query I am after.
So in order to get what I want, I had to this to my selector "_id": {"$gt":null}, which now looks like:
{
"selector": {
"_id": {
"$nin":["v1","v2"],
"$gt":null
}
},
"fields":["_id", "field1", "field2"],
"use_index": "mydesigndoc/myindex"
}
Why is this behavior? This seems to be only happening if I use the _id field in the selector.
What are the ramifications of adding "_id": {"$gt":null} to my selector? Is this going to scan the entire table rather than use the index?
I would appreciate any help, thank you
Cloudant Query can use Cloudant's pre-existing primary index for selection and range querying without you having to create your own index in the _id field.
Unfortunately, the index doesn't really help when using the $nin operator - Cloudant would have to scan the entire database to check for documents which are not in your list - the index doesn't really get it any further forward.
By changing the operator to $eq you are playing to the strengths of the index which can be used to locate the record you need quickly and efficiently.
In short, the query you are attempting is inefficient. If your query was more complex e.g. the equivalent of WHERE colour='red' AND _id NOT IN ['a','b'] then a Cloudant index on colour could be used to reduce the data set to a reasonable level before doing the $nin operation on the remaining data.