NDB Query StringProperty + StructuredProperty [duplicate] - google-app-engine

There are many properties in my model that I currently don't need indexed but can imagine I might want indexed at some unknown point in the future. If I explicitly set indexed=False for a property now but change my mind down the road, will Datastore rebuild the entire indices automatically at that point, including for previously written data? Are there any other repercussions for taking this approach?

No, changing indexed=True to indexed=False (and vice-versa) will only affect entities written after that point to the datastore. Here is the documentation that talks about it and the relevant paragraph:
Similarly, changing a property from indexed to unindexed only affects entities subsequently written to the Datastore. The index entries for any existing entities with that property will continue to exist until the entities are updated or deleted. To avoid unwanted results, you must purge your code of all queries that filter or sort by the (now unindexed) property.
If you decide later that you want to starting indexing properties, you'll have to go through your entities and re-put them into the datastore.
Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:
Retrieve (get) the entity from the Datastore.
Write (put) the entity back to the Datastore.

To index properties of existing entities (as per the documentation):
Retrieve (get) the entity from the Datastore.
Write (put) the entity back to the Datastore.
didn't work for me. I employed appengine-mapreduce library and wrote a MapOnlyMapper<Entity, Void> using DatastoreMutationPool for indexing all the existing entities in Datastore.
Lets assume the property name was unindexed and I want to index this in all the existing entities. What I had to do is:
#Override
public void map(Entity value) {
String property = "name";
Object existingValue = value.getProperty(property);
value.setIndexedProperty(property, existingValue);
datastoreMutationPool.put(value);
}
Essentially, you will have to set the property as indexed property using setIndexedProperty(prop, value) and then save (put) the entity.
I know I am very late in posting an answer. I thought I could help someone who might be struggling with this problem.

Related

Updating an objectify entity without changing indexed properties

Say I have an objectify entity with 1 unindexed and 5 indexed fields. If I were to update the entity by modifying the unindexed property alone, would it cause to rewrite the indices for the five indexed fields as well? Essentially I am worried about the write cost here.
Google charges per-entity write, irrespective of the number of indexes.
See https://cloud.google.com/appengine/pricing#costs-for-datastore-calls
Yes, every update of an entity causes updates of all indexed properties. In other words, the write costs are the same whether only one property is updated or all of them.
This is not specific to Objectify - it's how the Datastore works.

How to delete a property/value from an entity using google cloud datastore

I am learning google.cloud.datastore, and like to know how to delete a property along with its value from an entity. Also, is it possible to delete a specific or a list of properties from all entities of a certain kind?
My understanding is datastore stores/manipulates data in a row-wise way (entities)?
cheers
Your understanding is correct, all datastore write operations happen, indeed, at the entity level. So in order to modify one or a subset of properties you'd retrieve the entity, modify the property (or delete it, if you want to delete the property) set and save the entity.
The exact details depend on the language and library used. From Updating an entity:
To update an existing entity, modify the properties of the entity
and store it using the key:
PYTHON
with client.transaction():
key = client.key('Task', 'sample_task')
task = client.get(key)
task['done'] = True
client.put(task)
The object data overwrites the existing entity. The entire object is
sent to Cloud Datastore. If the entity does not exist, the update will
fail. If you want to update-or-create an entity, use upsert as
described previously.
Note: To delete a property, remove the property from the entity, then save the entity.
In the above snippet, for example, deleting the done property of the task entity, if existing, would be done like this:
with client.transaction():
key = client.key('Task', 'sample_task')
task = client.get(key)
if 'done' in task:
del task['done']
client.put(task)

Indexes in Google Datastore

By default indexing is enabled for all the fields in the ndb based model class.
What if I change the indexing definition for a field and redeploy the app; will it drop the indexing or recreate it, for that field, based on the changes in the model class?
Or is it like entity relationships which can't be changed once defined. I am asking this because, I am not sure at this point, how many fields I would require to be indexed in the final application ?
You can at any time change the definition of an entity object, the important thing is whether the property is set to be indexed when you put(). Say I have inserted a bunch of objects with a "name" property, un-indexed. Later I add an index to future put()'s on those entities. All my entities will still be in the datastore, just the ones that were indexed are query-able. A similar logic applies when I remove indexing from a language-local model property (java #Entity class for example, with objectify), and then do put().
This is what it means to have a schemaless datastore. They can have all different combinations of properties and indexing on/off for each of them. The only thing that truly binds these entities together is their "kind", which is set to the classname by the framework you're using, or set by hand if you're using the truly low-level API.
Read more here to understand better how indexing works in the schemaless datastore. This answers your question completely if you read the section linked.

view simple indexes on AppEngine Datastore

How can I view the simple index definitions on Googles AppEngine Datastore? Is it possible at all?
There is a "Datastore Indexes" view which only displays the composite indexes as it seems (the ones you define in datastore_indexes.xml).
What do you mean by does not work? For non custom index, you should put the old objects to include them in the index.
From the doc https://developers.google.com/appengine/docs/python/datastore/indexes
"Note, however, that changing a property from unindexed to indexed does not affect any existing entities that may have been created before the change. Queries filtering on the property will not return such existing entities, because the entities weren't written to the query's index when they were created. To make the entities accessible by future queries, you must rewrite them to the Datastore so that they will be entered in the appropriate indexes. That is, you must do the following for each such existing entity:"
It's not possible (yet) to view the simple index definitions on your datastore model.
The actual index in the datastore can vary between entity instances (if the definition was changed at a time where there already was data stored). Changing simple indexes thus requires a manual migration (read and put all data so it is stored and indexed again with the new definition). Thanks #marcadian for the pointer.

Appengine's Indexing order, cursors, and aggregation

I need to do some continuous aggregation on a data set. I am using app engines High Replication Datastore.
Lets say we have a simple object with a property that holds a string of the date when it's created. There's other fields associated with the object but it's not important in this example.
Lets say I create and store some objects. Below is the date associated with each object. Each object is stored in the order below. These objects will be created in separate transactions.
Obj1: 2012-11-11
Obj2: 2012-11-11
Obj3: 2012-11-12
Obj4: 2012-11-13
Obj5: 2012-11-14
The idea is to use a cursor to continually check for new indexed objects. Aggregation on the new indexed entities will be performed.
Here are the questions I have:
1) Are objects indexed in order? As in is it possible for Obj4 to be indexed before Obj 1,2, and 3? This will be a issue if i use a ORDER BY query and a cursor to continue searching. Some entities will not be found if there is a delay in indexing.
2) If no ORDER BY is specified, what order are entities returned in a query?
3) How would I go about checking for new indexed entities? As in, grab all entities, storing the cursor, then later on checking if any new entities were indexed since the last query?
Little less important, but food for thought
4) Are all fields indexed together? As in, if I have a date property, and lets say a name property, will both properties appear to be indexed at the same time for a given object?
5) If multiple entities are written in the same transaction, are all entities in the transaction indexed at the same time?
6) If all entities belong to the same entity group, are all entities indexed at the same time?
Thanks for the responses.
All entities have default indexes for every property. If you use ORDER BY someProperty then you will get entities ordered by values of that property. You are correct on index building: queries use indexes and indexes are built asynchronously, meaning that it's possible that query will not find an entity immediately after it was added.
ORDER BY defaults to ASC, i.e. ascending order.
Add a created timestamp to you entity then order by it and repeat the cursor. See Cursors and Data Updates.
Indexes are built after put() operation returns. They are also built in parallel. Meaning that when you query some indexes may be build, some not. See Life of a Datastore Write. Note that if you want to force "apply" on an entity you can issue a get() after put(), which will force the changes to be applied (= indexes written).
and 6. All entities touched in the same transaction must be in the same entity group (=have common parent). Transaction isolation docs state that transactions can be unapplied, meaning that query after put() will not find new entities. Again, you can force entity to be applied via a read or ancestor query.

Resources