Create a limited datastore index - google-app-engine

I have a query in my GAE app that looks like this:
datastore.NewQuery("item").Ancestor(fk).Order("-PubDate").Limit(10).Run(c)
In order for this to work I need an index of items ordered by PubDate; the autogenerated one looks like:
- kind: item
ancestor: yes
properties:
- name: PubDate
direction: desc
This index is pretty big (about 4 GB) but most of it will never be touched because of that Limit() call. Is it possible to have the index only remember 10 results for each ancestor?

It is possible to have two entities of the same kind to have the same property, but one entity having this property indexed and the other unindexed.
The low-level Datastore API in Java runtime allows an app to decide whether to index or not each property for each individual entity. I don't know if an equivalent exists in other runtimes. If not, you can use two different property names to indicate an indexed date and unindexed date.
So technically, yes, you can keep only a small number of entities in the index. Note, however, that you will have to re-save an entity with a property unindexed in order to remove it from that index. Re-saving all entities will incur additional costs, so this solution probably makes sense if you re-save an entity anyway for any other reason.

Related

Updating an objectify entity without changing indexed properties

Say I have an objectify entity with 1 unindexed and 5 indexed fields. If I were to update the entity by modifying the unindexed property alone, would it cause to rewrite the indices for the five indexed fields as well? Essentially I am worried about the write cost here.
Google charges per-entity write, irrespective of the number of indexes.
See https://cloud.google.com/appengine/pricing#costs-for-datastore-calls
Yes, every update of an entity causes updates of all indexed properties. In other words, the write costs are the same whether only one property is updated or all of them.
This is not specific to Objectify - it's how the Datastore works.

Missing index on specific entities in app engine

I have entities in app engine which I query as:
foo = Foo.all().filter('bar =', baz).get()
#baz is unicode, bar is a StringProperty
#Foo inherits from db.Model
This works for most entities, but for some value of baz, no entity is returned, even though the entity certainly exists, as can be verified at https://console.cloud.google.com/datastore/entities/ The cause is that for that specific entity there is no index on it's value of bar, as evidenced by the lack of a checkmark in the 'Indexed' column at that web page.
The docs state that
Indexes for simple queries, such as queries over a single property, are created automatically
So I would have expected that all entities of that type would have an index on that property, but evidently that is incorrect. Questions:
Q1: when the index is created, is it added to entities that were put prior to the first time a query is run using that index? (or is the index created the first time any entity of that type is put?)
Q2: if not, what changes to the entity (if any) will cause the index to be added to that property? (i tried changing a property other than bar, and putting, and that did not cause the entity to be added)
Q3: would explicitly listing the index in index.yaml change this behavior?
Q4: is there a way to programatically determine whether an entity has an index on a specific property?
Q5: (bonus) is there any google documentation on the above?
thanks
Q1) The index for individual properties is created automatically created when you write the first entity that has that property (with indexed=true). However, whether or not a property is added to the index is an entity/property level attribute that is set when you write it.
Q2) Every property there is a flag that tells the back-end if it should index the property.If you read the entity and write it back down with the flag set to true on bar it will be inserted into the index.
Q3) index.yaml is only for composite indexes (multi-property indexes). Individual properties are controlled by a property-level flag when you write/update the entity and do not need to be pre-configured.
Q4) Only by reading back every entity and checking the index flag for the property in question.
Q5) For composite indexes you can read the Datastore Indexes. For property indexes, read the Entities, Properties, and Keys page down at the "Property and Value Types" section - you'll see lots about indexes there.
What's the length of the data you're storing? Documentation says:
Short strings (up to 1500 bytes) are indexed and can be used in query filter conditions and sort orders.
Long strings (up to 1 megabyte) are not indexed and cannot be used in query filters and sort orders.
More information on index creation in general here + its "related articles".

view simple indexes on AppEngine Datastore

How can I view the simple index definitions on Googles AppEngine Datastore? Is it possible at all?
There is a "Datastore Indexes" view which only displays the composite indexes as it seems (the ones you define in datastore_indexes.xml).
What do you mean by does not work? For non custom index, you should put the old objects to include them in the index.
From the doc https://developers.google.com/appengine/docs/python/datastore/indexes
"Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:"
It's not possible (yet) to view the simple index definitions on your datastore model.
The actual index in the datastore can vary between entity instances (if the definition was changed at a time where there already was data stored). Changing simple indexes thus requires a manual migration (read and put all data so it is stored and indexed again with the new definition). Thanks #marcadian for the pointer.

Appengine's Indexing order, cursors, and aggregation

I need to do some continuous aggregation on a data set. I am using app engines High Replication Datastore.
Lets say we have a simple object with a property that holds a string of the date when it's created. There's other fields associated with the object but it's not important in this example.
Lets say I create and store some objects. Below is the date associated with each object. Each object is stored in the order below. These objects will be created in separate transactions.
Obj1: 2012-11-11
Obj2: 2012-11-11
Obj3: 2012-11-12
Obj4: 2012-11-13
Obj5: 2012-11-14
The idea is to use a cursor to continually check for new indexed objects. Aggregation on the new indexed entities will be performed.
Here are the questions I have:
1) Are objects indexed in order? As in is it possible for Obj4 to be indexed before Obj 1,2, and 3? This will be a issue if i use a ORDER BY query and a cursor to continue searching. Some entities will not be found if there is a delay in indexing.
2) If no ORDER BY is specified, what order are entities returned in a query?
3) How would I go about checking for new indexed entities? As in, grab all entities, storing the cursor, then later on checking if any new entities were indexed since the last query?
Little less important, but food for thought
4) Are all fields indexed together? As in, if I have a date property, and lets say a name property, will both properties appear to be indexed at the same time for a given object?
5) If multiple entities are written in the same transaction, are all entities in the transaction indexed at the same time?
6) If all entities belong to the same entity group, are all entities indexed at the same time?
Thanks for the responses.
All entities have default indexes for every property. If you use ORDER BY someProperty then you will get entities ordered by values of that property. You are correct on index building: queries use indexes and indexes are built asynchronously, meaning that it's possible that query will not find an entity immediately after it was added.
ORDER BY defaults to ASC, i.e. ascending order.
Add a created timestamp to you entity then order by it and repeat the cursor. See Cursors and Data Updates.
Indexes are built after put() operation returns. They are also built in parallel. Meaning that when you query some indexes may be build, some not. See Life of a Datastore Write. Note that if you want to force "apply" on an entity you can issue a get() after put(), which will force the changes to be applied (= indexes written).
and 6. All entities touched in the same transaction must be in the same entity group (=have common parent). Transaction isolation docs state that transactions can be unapplied, meaning that query after put() will not find new entities. Again, you can force entity to be applied via a read or ancestor query.

Avoid default index but keep explicitly defined index in AppEngine?

I have some properties that are only referenced in queries that require composite indices. AppEngine writes all indexed properties to their own special indexes, which requires 2 extra write operations per property.
Is there any way to specify that a property NOT be indexed in its own index, but still be used for my composite index?
For example, my entity might be a Person with properties name and group. The only query in my code is select * from Person where group = <group> and name > <name>, so the only index I really need is with group ascending and name ascending. But right now AppEngine is also creating an index on name and an index on group, which triples the number of write operations required to write each entity!
I can see from the documentation how to prevent a property from being used for indexing at all, but I want to turn off indexing only for a few indexes (the default ones).
From what I understand currently, you can either disable indexing on a property all together (which includes composite indexes) or you are stuck with having all indexes (automatic indexes + your composite indexs from index.yaml).
There was some discussion about this in the GAE google group and a feature request to do exactly what you are suggesting, but I wasn't able to find it. Will update the answer when I get home and search more.

Resources