MongoDB: multiple indexex - database

Is it possible to create more than one text index and more than one array index in one collection in MongoDB?
I don't mean the single index with multiple fields.

Text Indexes : MongoDB doesn't allow user to create multiple single field text indexes, it makes sense as in your query you won't specify field name when using $text search($text search is meant for text indexes), if you really need it, you can create a compound text index and use weights as an option - that way you're saying your query's are most likely matched to a particular field in a compound text index.
Ref : Text Indexes MongoDB , $text search Documentation
Multi-Key Indexes :
You can create & would create multi-key indexes same as like creating
other indexes types(I mean to say you no need to pass any options
like when creating unique/sparse etc.).
At the time of creation of index - MongoDB would automatically
converts a normal index to multi-key index if it finds an array in any of the document for a
particular field being indexed or After index is created then at any time - mongoDB can convert an index to multi-key index if
an array is inserted in any document on a particular field.
You can check Multi-Key Indexes by finding
isMultiKey : true on a index upon doing getIndexes() on a collection.
Limitations of Multi-Key Indexes :
Remember not to create a compound index in which both fields are arrays, cause it would explode as it needs to create index keys on Cartesian product of those two fields.
Additionally, mongoDB would throw an error if you try to insert a
document with two fields as arrays, where the two fields are already compound indexed.
Ref : MultiKey Indexes MongoDB

Adding my comment as an answer fo easy future reference.
As far as I remember you can't have multiple text index in a single collection. MongoDB allows you to create multiple text index, however, gives error while searching. Refer: jira.mongodb.org/browse/SERVER-8127

Related

How to find the number of duplicate documents in solr based on a indexed field

I have few near duplicate documents stored in solr. Schema has a autogenerated uuid as the unique key so duplicates can get into the index. I need to get the counts of duplicated documents based on field/fields in the schema.
I am trying to get quick numbers without writing a client program and going through the full result set, something on solr console itself.
Tried to use facets but not able to get the total counts. below query gives the duplicates for each value of 'idfield' but they need to be iterated till last page and summed up (over couple of million entries).
q=*:*&facet=true&facet.mincount=2&facet.field=idfield
jason facet query can be used to find out unique values as explained in this blog
http://yonik.com/solr-count-distinct/
or it can be done using collapse filter and finding the difference
q=*:*&fq={!collapse=true field=idfield} - get the numfound and subtract from MatchAllDocs query (*:*)
You can also use facet.mincount=2 to get duplicate documents by faceting on unique id field. Ex: /solr/core/select?q=:&facet=on&facet.field=uniqueidfield&facet.mincount=2&facet.missing=true
Also you can add facet.limit=-1&rows=0 to get the document ids with duplicate ids.

How to create multiple index on two fields in google Objectify which should not be included in update and create request?

I want to create multiple index in an entity
on ID and creation date
There is one condition i dont want to use these index on update and create of that object
I am using Google objectify
I will use these multiple index in my search query
Please help?
Objectify has a feature called partial indexes , which define conditions that a certain property has to meet in order to get indexed.
You could hack that so that those indexed fields are only indexed if a given attribute (ej lastOperation) is not create or update.
Bear in mind tampering with index updates might lead to invalid query results as the index records (used for search) wont match the actual entity values.

Datastore Index Creation Fails Without Explanation

I'm trying to create a compound index with a single Number field and a list of Strings field. When I view the status of the index it just has an exclamation mark with no explanation. I assume it is because datastore concludes that it is an exploding index based on this FAQ page: https://cloud.google.com/appengine/articles/index_building#FAQs.
Is there any way to confirm what the actual failure reason is? Is it possible to split the list field into multiple fields based on some size limit and create multiple indexes for each chunk?
You get the exploding indexes problem when you have an index on multiple list/repeated properties. In this case a single entity would generate all combinations of the property values (i.e. an index on (A, B) where A has N entries and B has M entries will generate N*M index entries).
In this case you shouldn't get the exploding index problem since you aren't combining two repeated fields.
There are some other obscure ways in which an index build can fail. I would recommend filing a production ticket so that someone can look into your specific indexes.
I believe it was the 1000 item limit per entity for indexes on list properties. I partitioned the property into groups of 999, e.g. property1, property2 etc. as needed. I was then able to create indexes for each chunked property successfully.

Avoid default index but keep explicitly defined index in AppEngine?

I have some properties that are only referenced in queries that require composite indices. AppEngine writes all indexed properties to their own special indexes, which requires 2 extra write operations per property.
Is there any way to specify that a property NOT be indexed in its own index, but still be used for my composite index?
For example, my entity might be a Person with properties name and group. The only query in my code is select * from Person where group = <group> and name > <name>, so the only index I really need is with group ascending and name ascending. But right now AppEngine is also creating an index on name and an index on group, which triples the number of write operations required to write each entity!
I can see from the documentation how to prevent a property from being used for indexing at all, but I want to turn off indexing only for a few indexes (the default ones).
From what I understand currently, you can either disable indexing on a property all together (which includes composite indexes) or you are stuck with having all indexes (automatic indexes + your composite indexs from index.yaml).
There was some discussion about this in the GAE google group and a feature request to do exactly what you are suggesting, but I wasn't able to find it. Will update the answer when I get home and search more.

Can we have a Model with a lot of properties (say 30) while avoiding the exploding indexes pitfall?

I was thinking that maybe you can have the index.yaml only specify certain indexes (not all the possible ones that GAE automatically does for you).
If that's not a good idea, what is another way of dealing with storing large amount of properties other than storing extra properties as a serialized object in a blob property.
The new improved query planner should generate optimized index definitions.
Note that you can set a property as unindexed by using indexed=False in python or Entity.setUnindexedProperty in Java.
A few notes:
Exploding indexes happen when you have multiple properties that contain "multiple values", i.e. an entity with MULTIPLE list properties AND those properties are listed in a composite index. In this case index entry is created for each list property value combination. In different words: index entries created equals a product of list properties size. So a list property with 20 entries and another list property with 30 entries would create, when BOTH listed in index.yaml under one compound index, 600 index entries.
Exploding indexes do not happen for simple (non-list) properties, or if there is only one list property in entity.
Exploding indexes also do not happen if you do not create a compound index in your index.yaml file, listing at least two list properties in same index.
If you have a lot of properties and you do not need to query upon them, than you can simply put them in a list or two parallel lists (to simulate map), or serialize them. The simplest would be two two parallel lists: this is done automatically for you if you use objectify with embedded classes.

Resources