Google App Engine Datastore non-compound indexes - google-app-engine

I am trying to use indexes on an entity, for example
Entity Person
- FirstName String indexed
- LastName String indexed
- Address String indexed
and more indexed properties
However, I will query with just an indexed property at a time. I will not make the query by using FirstName and LastName as the filter criteria in one query, for example, but just one of them. I had the experience of having the large index size in datastore from the combination of indexed properties in an entity. I want to have my properties indexed individually, not compositely. Is there any way to do this?

You have to create compound indexes - they don't appear by themselves. You do it either manually, or it happens automatically in your development environment when you run a compound query. If you never run these queries, no compound indexes will be created.
You can always check which indexes you have in your project by going to Datastore > Indexes in your Google Cloud Console.

Related

How does indexing work in google app datastore if entity have unindexed data earlier?

I have a table in ndb datatstore. In that I have
updated = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
created = ndb.DateTimeProperty(auto_now_add=True, indexed=False)
With this structure I have many records in the table. Now I am updating the unindexed fields indexed=True. So, will it index all updated and created data present till date in that table or it will start indexing data to be filled in after indexing ?
And how do I index the unindexed rows of these columns ?
These properties will not be indexed on existing entities, until you rewrite them with the index enabled. This is because indexes are set at a per entity level.
To ensure you index all these fields, you'll need to read every entity then write it back down. For smaller datasets, you do go this with a simple query and loop. For larger datasets you will want to explore something like Cloud Dataflow.
If you have a large dataset and concerns on costs, you could do some optimizations. For example, do a keys-only query against the indexed fields, then if any read entity matches that list, don't write it back (since it's already indexed).

Create a limited datastore index

I have a query in my GAE app that looks like this:
datastore.NewQuery("item").Ancestor(fk).Order("-PubDate").Limit(10).Run(c)
In order for this to work I need an index of items ordered by PubDate; the autogenerated one looks like:
- kind: item
ancestor: yes
properties:
- name: PubDate
direction: desc
This index is pretty big (about 4 GB) but most of it will never be touched because of that Limit() call. Is it possible to have the index only remember 10 results for each ancestor?
It is possible to have two entities of the same kind to have the same property, but one entity having this property indexed and the other unindexed.
The low-level Datastore API in Java runtime allows an app to decide whether to index or not each property for each individual entity. I don't know if an equivalent exists in other runtimes. If not, you can use two different property names to indicate an indexed date and unindexed date.
So technically, yes, you can keep only a small number of entities in the index. Note, however, that you will have to re-save an entity with a property unindexed in order to remove it from that index. Re-saving all entities will incur additional costs, so this solution probably makes sense if you re-save an entity anyway for any other reason.

view simple indexes on AppEngine Datastore

How can I view the simple index definitions on Googles AppEngine Datastore? Is it possible at all?
There is a "Datastore Indexes" view which only displays the composite indexes as it seems (the ones you define in datastore_indexes.xml).
What do you mean by does not work? For non custom index, you should put the old objects to include them in the index.
From the doc https://developers.google.com/appengine/docs/python/datastore/indexes
"Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:"
It's not possible (yet) to view the simple index definitions on your datastore model.
The actual index in the datastore can vary between entity instances (if the definition was changed at a time where there already was data stored). Changing simple indexes thus requires a manual migration (read and put all data so it is stored and indexed again with the new definition). Thanks #marcadian for the pointer.

Appengine's Indexing order, cursors, and aggregation

I need to do some continuous aggregation on a data set. I am using app engines High Replication Datastore.
Lets say we have a simple object with a property that holds a string of the date when it's created. There's other fields associated with the object but it's not important in this example.
Lets say I create and store some objects. Below is the date associated with each object. Each object is stored in the order below. These objects will be created in separate transactions.
Obj1: 2012-11-11
Obj2: 2012-11-11
Obj3: 2012-11-12
Obj4: 2012-11-13
Obj5: 2012-11-14
The idea is to use a cursor to continually check for new indexed objects. Aggregation on the new indexed entities will be performed.
Here are the questions I have:
1) Are objects indexed in order? As in is it possible for Obj4 to be indexed before Obj 1,2, and 3? This will be a issue if i use a ORDER BY query and a cursor to continue searching. Some entities will not be found if there is a delay in indexing.
2) If no ORDER BY is specified, what order are entities returned in a query?
3) How would I go about checking for new indexed entities? As in, grab all entities, storing the cursor, then later on checking if any new entities were indexed since the last query?
Little less important, but food for thought
4) Are all fields indexed together? As in, if I have a date property, and lets say a name property, will both properties appear to be indexed at the same time for a given object?
5) If multiple entities are written in the same transaction, are all entities in the transaction indexed at the same time?
6) If all entities belong to the same entity group, are all entities indexed at the same time?
Thanks for the responses.
All entities have default indexes for every property. If you use ORDER BY someProperty then you will get entities ordered by values of that property. You are correct on index building: queries use indexes and indexes are built asynchronously, meaning that it's possible that query will not find an entity immediately after it was added.
ORDER BY defaults to ASC, i.e. ascending order.
Add a created timestamp to you entity then order by it and repeat the cursor. See Cursors and Data Updates.
Indexes are built after put() operation returns. They are also built in parallel. Meaning that when you query some indexes may be build, some not. See Life of a Datastore Write. Note that if you want to force "apply" on an entity you can issue a get() after put(), which will force the changes to be applied (= indexes written).
and 6. All entities touched in the same transaction must be in the same entity group (=have common parent). Transaction isolation docs state that transactions can be unapplied, meaning that query after put() will not find new entities. Again, you can force entity to be applied via a read or ancestor query.

Avoid default index but keep explicitly defined index in AppEngine?

I have some properties that are only referenced in queries that require composite indices. AppEngine writes all indexed properties to their own special indexes, which requires 2 extra write operations per property.
Is there any way to specify that a property NOT be indexed in its own index, but still be used for my composite index?
For example, my entity might be a Person with properties name and group. The only query in my code is select * from Person where group = <group> and name > <name>, so the only index I really need is with group ascending and name ascending. But right now AppEngine is also creating an index on name and an index on group, which triples the number of write operations required to write each entity!
I can see from the documentation how to prevent a property from being used for indexing at all, but I want to turn off indexing only for a few indexes (the default ones).
From what I understand currently, you can either disable indexing on a property all together (which includes composite indexes) or you are stuck with having all indexes (automatic indexes + your composite indexs from index.yaml).
There was some discussion about this in the GAE google group and a feature request to do exactly what you are suggesting, but I wasn't able to find it. Will update the answer when I get home and search more.

Resources