Can we have a Model with a lot of properties (say 30) while avoiding the exploding indexes pitfall? - google-app-engine

I was thinking that maybe you can have the index.yaml only specify certain indexes (not all the possible ones that GAE automatically does for you).
If that's not a good idea, what is another way of dealing with storing large amount of properties other than storing extra properties as a serialized object in a blob property.

The new improved query planner should generate optimized index definitions.
Note that you can set a property as unindexed by using indexed=False in python or Entity.setUnindexedProperty in Java.

A few notes:
Exploding indexes happen when you have multiple properties that contain "multiple values", i.e. an entity with MULTIPLE list properties AND those properties are listed in a composite index. In this case index entry is created for each list property value combination. In different words: index entries created equals a product of list properties size. So a list property with 20 entries and another list property with 30 entries would create, when BOTH listed in index.yaml under one compound index, 600 index entries.
Exploding indexes do not happen for simple (non-list) properties, or if there is only one list property in entity.
Exploding indexes also do not happen if you do not create a compound index in your index.yaml file, listing at least two list properties in same index.
If you have a lot of properties and you do not need to query upon them, than you can simply put them in a list or two parallel lists (to simulate map), or serialize them. The simplest would be two two parallel lists: this is done automatically for you if you use objectify with embedded classes.

Related

Missing index on specific entities in app engine

I have entities in app engine which I query as:
foo = Foo.all().filter('bar =', baz).get()
#baz is unicode, bar is a StringProperty
#Foo inherits from db.Model
This works for most entities, but for some value of baz, no entity is returned, even though the entity certainly exists, as can be verified at https://console.cloud.google.com/datastore/entities/ The cause is that for that specific entity there is no index on it's value of bar, as evidenced by the lack of a checkmark in the 'Indexed' column at that web page.
The docs state that
Indexes for simple queries, such as queries over a single property, are created automatically
So I would have expected that all entities of that type would have an index on that property, but evidently that is incorrect. Questions:
Q1: when the index is created, is it added to entities that were put prior to the first time a query is run using that index? (or is the index created the first time any entity of that type is put?)
Q2: if not, what changes to the entity (if any) will cause the index to be added to that property? (i tried changing a property other than bar, and putting, and that did not cause the entity to be added)
Q3: would explicitly listing the index in index.yaml change this behavior?
Q4: is there a way to programatically determine whether an entity has an index on a specific property?
Q5: (bonus) is there any google documentation on the above?
thanks
Q1) The index for individual properties is created automatically created when you write the first entity that has that property (with indexed=true). However, whether or not a property is added to the index is an entity/property level attribute that is set when you write it.
Q2) Every property there is a flag that tells the back-end if it should index the property.If you read the entity and write it back down with the flag set to true on bar it will be inserted into the index.
Q3) index.yaml is only for composite indexes (multi-property indexes). Individual properties are controlled by a property-level flag when you write/update the entity and do not need to be pre-configured.
Q4) Only by reading back every entity and checking the index flag for the property in question.
Q5) For composite indexes you can read the Datastore Indexes. For property indexes, read the Entities, Properties, and Keys page down at the "Property and Value Types" section - you'll see lots about indexes there.
What's the length of the data you're storing? Documentation says:
Short strings (up to 1500 bytes) are indexed and can be used in query filter conditions and sort orders.
Long strings (up to 1 megabyte) are not indexed and cannot be used in query filters and sort orders.
More information on index creation in general here + its "related articles".

Create a limited datastore index

I have a query in my GAE app that looks like this:
datastore.NewQuery("item").Ancestor(fk).Order("-PubDate").Limit(10).Run(c)
In order for this to work I need an index of items ordered by PubDate; the autogenerated one looks like:
- kind: item
ancestor: yes
properties:
- name: PubDate
direction: desc
This index is pretty big (about 4 GB) but most of it will never be touched because of that Limit() call. Is it possible to have the index only remember 10 results for each ancestor?
It is possible to have two entities of the same kind to have the same property, but one entity having this property indexed and the other unindexed.
The low-level Datastore API in Java runtime allows an app to decide whether to index or not each property for each individual entity. I don't know if an equivalent exists in other runtimes. If not, you can use two different property names to indicate an indexed date and unindexed date.
So technically, yes, you can keep only a small number of entities in the index. Note, however, that you will have to re-save an entity with a property unindexed in order to remove it from that index. Re-saving all entities will incur additional costs, so this solution probably makes sense if you re-save an entity anyway for any other reason.

Datastore Index Creation Fails Without Explanation

I'm trying to create a compound index with a single Number field and a list of Strings field. When I view the status of the index it just has an exclamation mark with no explanation. I assume it is because datastore concludes that it is an exploding index based on this FAQ page: https://cloud.google.com/appengine/articles/index_building#FAQs.
Is there any way to confirm what the actual failure reason is? Is it possible to split the list field into multiple fields based on some size limit and create multiple indexes for each chunk?
You get the exploding indexes problem when you have an index on multiple list/repeated properties. In this case a single entity would generate all combinations of the property values (i.e. an index on (A, B) where A has N entries and B has M entries will generate N*M index entries).
In this case you shouldn't get the exploding index problem since you aren't combining two repeated fields.
There are some other obscure ways in which an index build can fail. I would recommend filing a production ticket so that someone can look into your specific indexes.
I believe it was the 1000 item limit per entity for indexes on list properties. I partitioned the property into groups of 999, e.g. property1, property2 etc. as needed. I was then able to create indexes for each chunked property successfully.

view simple indexes on AppEngine Datastore

How can I view the simple index definitions on Googles AppEngine Datastore? Is it possible at all?
There is a "Datastore Indexes" view which only displays the composite indexes as it seems (the ones you define in datastore_indexes.xml).
What do you mean by does not work? For non custom index, you should put the old objects to include them in the index.
From the doc https://developers.google.com/appengine/docs/python/datastore/indexes
"Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:"
It's not possible (yet) to view the simple index definitions on your datastore model.
The actual index in the datastore can vary between entity instances (if the definition was changed at a time where there already was data stored). Changing simple indexes thus requires a manual migration (read and put all data so it is stored and indexed again with the new definition). Thanks #marcadian for the pointer.

Avoid default index but keep explicitly defined index in AppEngine?

I have some properties that are only referenced in queries that require composite indices. AppEngine writes all indexed properties to their own special indexes, which requires 2 extra write operations per property.
Is there any way to specify that a property NOT be indexed in its own index, but still be used for my composite index?
For example, my entity might be a Person with properties name and group. The only query in my code is select * from Person where group = <group> and name > <name>, so the only index I really need is with group ascending and name ascending. But right now AppEngine is also creating an index on name and an index on group, which triples the number of write operations required to write each entity!
I can see from the documentation how to prevent a property from being used for indexing at all, but I want to turn off indexing only for a few indexes (the default ones).
From what I understand currently, you can either disable indexing on a property all together (which includes composite indexes) or you are stuck with having all indexes (automatic indexes + your composite indexs from index.yaml).
There was some discussion about this in the GAE google group and a feature request to do exactly what you are suggesting, but I wasn't able to find it. Will update the answer when I get home and search more.

Resources