Removing field indexes safely - google-app-engine

I'm moving the data model of an app engine app to Objectify, and I've noticed that Objectify for it's entities specifies all properties of an entity as unindexed by default, which makes sense to me as it would be quicker on writes and less space would be used up.
But the GAE default (at least when I wrote the app) is to create field endexes on all fields by default, so all my fields are indexed. And there is hundreds of thousands of rows.
I really only need only a small fraction of these fields indexed and I would like to set them as unindexed. I want to set these fields as #Unindexed in objectify, but how can I remove the indexed data already in the datastore?

To add or remove single-property indexes, change the metadata (add/remove #Index and #Unindex) and then load+save the entities. You may wish to use map/reduce for this.

Related

Updating an objectify entity without changing indexed properties

Say I have an objectify entity with 1 unindexed and 5 indexed fields. If I were to update the entity by modifying the unindexed property alone, would it cause to rewrite the indices for the five indexed fields as well? Essentially I am worried about the write cost here.
Google charges per-entity write, irrespective of the number of indexes.
See https://cloud.google.com/appengine/pricing#costs-for-datastore-calls
Yes, every update of an entity causes updates of all indexed properties. In other words, the write costs are the same whether only one property is updated or all of them.
This is not specific to Objectify - it's how the Datastore works.

NDB Query StringProperty + StructuredProperty [duplicate]

There are many properties in my model that I currently don't need indexed but can imagine I might want indexed at some unknown point in the future. If I explicitly set indexed=False for a property now but change my mind down the road, will Datastore rebuild the entire indices automatically at that point, including for previously written data? Are there any other repercussions for taking this approach?
No, changing indexed=True to indexed=False (and vice-versa) will only affect entities written after that point to the datastore. Here is the documentation that talks about it and the relevant paragraph:
Similarly, changing a property from indexed to unindexed only affects entities subsequently written to the Datastore. The index entries for any existing entities with that property will continue to exist until the entities are updated or deleted. To avoid unwanted results, you must purge your code of all queries that filter or sort by the (now unindexed) property.
If you decide later that you want to starting indexing properties, you'll have to go through your entities and re-put them into the datastore.
Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:
Retrieve (get) the entity from the Datastore.
Write (put) the entity back to the Datastore.
To index properties of existing entities (as per the documentation):
Retrieve (get) the entity from the Datastore.
Write (put) the entity back to the Datastore.
didn't work for me. I employed appengine-mapreduce library and wrote a MapOnlyMapper<Entity, Void> using DatastoreMutationPool for indexing all the existing entities in Datastore.
Lets assume the property name was unindexed and I want to index this in all the existing entities. What I had to do is:
#Override
public void map(Entity value) {
String property = "name";
Object existingValue = value.getProperty(property);
value.setIndexedProperty(property, existingValue);
datastoreMutationPool.put(value);
}
Essentially, you will have to set the property as indexed property using setIndexedProperty(prop, value) and then save (put) the entity.
I know I am very late in posting an answer. I thought I could help someone who might be struggling with this problem.

Create a limited datastore index

I have a query in my GAE app that looks like this:
datastore.NewQuery("item").Ancestor(fk).Order("-PubDate").Limit(10).Run(c)
In order for this to work I need an index of items ordered by PubDate; the autogenerated one looks like:
- kind: item
ancestor: yes
properties:
- name: PubDate
direction: desc
This index is pretty big (about 4 GB) but most of it will never be touched because of that Limit() call. Is it possible to have the index only remember 10 results for each ancestor?
It is possible to have two entities of the same kind to have the same property, but one entity having this property indexed and the other unindexed.
The low-level Datastore API in Java runtime allows an app to decide whether to index or not each property for each individual entity. I don't know if an equivalent exists in other runtimes. If not, you can use two different property names to indicate an indexed date and unindexed date.
So technically, yes, you can keep only a small number of entities in the index. Note, however, that you will have to re-save an entity with a property unindexed in order to remove it from that index. Re-saving all entities will incur additional costs, so this solution probably makes sense if you re-save an entity anyway for any other reason.

Indexes in Google Datastore

By default indexing is enabled for all the fields in the ndb based model class.
What if I change the indexing definition for a field and redeploy the app; will it drop the indexing or recreate it, for that field, based on the changes in the model class?
Or is it like entity relationships which can't be changed once defined. I am asking this because, I am not sure at this point, how many fields I would require to be indexed in the final application ?
You can at any time change the definition of an entity object, the important thing is whether the property is set to be indexed when you put(). Say I have inserted a bunch of objects with a "name" property, un-indexed. Later I add an index to future put()'s on those entities. All my entities will still be in the datastore, just the ones that were indexed are query-able. A similar logic applies when I remove indexing from a language-local model property (java #Entity class for example, with objectify), and then do put().
This is what it means to have a schemaless datastore. They can have all different combinations of properties and indexing on/off for each of them. The only thing that truly binds these entities together is their "kind", which is set to the classname by the framework you're using, or set by hand if you're using the truly low-level API.
Read more here to understand better how indexing works in the schemaless datastore. This answers your question completely if you read the section linked.

view simple indexes on AppEngine Datastore

How can I view the simple index definitions on Googles AppEngine Datastore? Is it possible at all?
There is a "Datastore Indexes" view which only displays the composite indexes as it seems (the ones you define in datastore_indexes.xml).
What do you mean by does not work? For non custom index, you should put the old objects to include them in the index.
From the doc https://developers.google.com/appengine/docs/python/datastore/indexes
"Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:"
It's not possible (yet) to view the simple index definitions on your datastore model.
The actual index in the datastore can vary between entity instances (if the definition was changed at a time where there already was data stored). Changing simple indexes thus requires a manual migration (read and put all data so it is stored and indexed again with the new definition). Thanks #marcadian for the pointer.

Resources