Cloud Datastore - Exclude indexes from index.yaml file - google-app-engine

I would like to have indexes only on few fields in a kind. Rather than excluding all the fields in the in the Java code during the creation of Entity as described here, I was wondering if there is a way I can define it in index.yaml file and not worry about it during creation of entities.

App Engine applications written in Java do not have an index.yaml file, they have a datastore-indexes.xml file instead. However, the concept is the same.
By default, most properties are indexed by default. Any composite indexes must be defined in your index config file (yaml or xml depending on language). When defining your models, you can tell App Engine to prevent auto-indexing a property. This will save write-ops and speed up your app.
To answer your question more specifically, you cannot use the index config file to prevent index creation, rather it is used to tell App Engine which indexes to create.
Also, indexes are only created as entities are saved. So if you add more after entities have been crested, you will need to run a script to update them.
Similarly, to remove indexes after they have been created you need to do this from the command line using the sdk. See here.

Related

Does Vespa support dynamic fields?

I am looking for any option like dynamic fields (Solr) in Vespa. I need to add new fields to the existing schema without redeployment of the whole application.
Anything related to this is mentioned in documentation of Vespa http://docs.vespa.ai/documentation/search-definitions.html#modify-search-definitions
where it is mentioned that we can add a new field and run vespa-deploy prepare and vespa-deploy activate. But isn't it resubmission of whole application? What is an option to implement it at least overhead?
http://docs.vespa.ai/documentation/search-definitions.html#modify-search-definitions is the right document, yes
altering the search definition is safe, as the deploy prepare step will output steps needed (e.g. no steps, re-start or re-feed). Most actions do not require re-start or re-feed, as Vespa is built to be production friendly, and adding fields is easy as it requires no actions.
Note that there is no support for default values for fields.
As the Vespa configuration is declarative, the full application package is submitted, but the config servers will calculate changes and deploy the delta to the nodes. This makes it easy to keep the application package config in a code repo like git - what you see in the repo is what is deployed.
It depends on what you mean by "dynamic".
If the fields number in the hundreds, are controlled by the application owners and changed daily, then changing the schema and redeploy works perfectly fine: Fields can be updated individually (even when indexed), they incur no overhead if not used, and deploying an application with new fields is cheap and do not require any restarts or anything like that.
If you need tens of thousands of fields, or the fields are added and removed by users, then Vespa does not have a solution for you out of the box. You'd need to mangle the fields into the same index by adding e.g a "myfield_" prefix to each token. This is what engines that support this do internally to make this efficient.

How can I programmatically determine which Datastore indexes are in error?

When I run update_indexes on Google Datastore I get the message below. It is telling me to determine which indexes are in error by looking at the GUI, then delete these indexes.
I have 51 erroneous indexes out of 200, and copying them out of the GUI is not feasible.
(Edit: By laboriously removing and adding indexes from the datastore-indexes.xml, we identified the one problematic index.)
Good devops procedure demands that we do this sort of thing automatically.
How does one determine which indexes are in error programmatically? (Python, bash, or even Java are OK.)
Cannot build indexes that are in state ERROR.To vacuum and rebuild your indexes:
1. Create a backup of your index.yaml specification.
2. Determine the indexes in state ERROR from your admin console: https://appengine.google.com/datastore/indexes?&app_id=s~myproject
3. Remove the definitions of the indexes in ERROR from your index.yaml file.
4. Run "appcfg.py vacuum_indexes your_app_dir/"
5. Wait until the ERROR indexes no longer appear in your admin console.
6. Replace the modified version of your index.yaml file with the original.
7. Run "appcfg.py update_indexes your_app_dir/"
Unfortunately Cloud Datastore doesn't have a public API for managing indexes and the current command line tools use an internal API that doesn't have access to that information.
We're aiming to have an index management API sometime next year (already working on designs) and I'll make sure this key use case is something we cover.

Memcache flush_all not working as expected with namespaces on Google App Engine Python

Going over the documentation for multitenancy + memcache it seems that memcache entries are separated for each namespace. See documentation here.
The problem is that when we call:
memcache.flush_all()
Everything is flushed in the memcache, not just the entries for the current namespace.
Before calling flush_all() we are explicitly setting the namespace using the following code:
namespace_manager.set_namespace(foo)
How can I flush entries in memcache only for the current namespace?
Yes, it is somewhat unexpected that flush_all deletes everything in your app's memcache not just for one namespace. The App Engine development team has a feature request to allow per-namespace flushing. See Issue 5190.
One workaround is for you to maintain a persistent integer "generation count" that you include as part of the namespace name. When you want to flush a namespace, you instead increment the generation count and use the new namespace, which will be brand new and empty. You can ignore the items in your old namespace, as they will slowly get evicted over time.

Best strategy to initially populate a Grails database backend

I'd like to know your approach/experiences when it's time to initially populate the Grails DB that will hold your app data. Assuming you have CSVs with data, is is "safer" to create a script (with whatever tool fits you) that:
1.-Generates the Bootstrap commands with the domain classes, run it in test or dev environment and then use the native db commands to export it to prod?
2.-Create the DB's insert script assuming GORM's version = 0 and incrementing manually the soon-to-be autogenerated IDs ?
My fear is that the second approach may lead to inconsistencies for hibernate will have the responsability for the IDs generation and there may be something else I'm missing.
Thanks in advance.
Take a look at this link. This allows you to run groovy scripts in the normal grails context giving you access to all grails features including GORM. I'm currently importing data from a legacy database and have found that writing a Groovy script using the Groovy SQL interface to pull out the data then putting that data in domain objects appears to be the easiest thing to do. Once you have the data imported you just use the commands specific to your database system to move that data to the production database.
Update:
Apparently the updated entry referenced from the blog entry I link to no longer exists. I was able to get this working using code at the following link which is also referenced in the comments.
http://pastie.org/180868
Finally it seems that the simplest solution is to consider that GORM as of the current release (1.2) uses a single sequence for all auto-generated ids. So considering this when creating whatever scripts you need (in the language of your preference) should suffice. I understand it's planned for 1.3 release that every table has its own sequence.

How can I associate images with entities in Google App Engine

I'm working on a Google App Engine application, and have come to the point where I want to associate images on the filesystem to entities in the database.
I'm using the bulkupload_client.py script to upload entities to the database, but am stuck trying to figure out how to associate filesystem files to the entities. If an entity has the following images: main,detail,front,back I think I might want a naming scheme like this: <entity_key>_main.jpg
I suppose I could create a GUID for each entity and use that, but I'd rather not have to do that.
Any ideas?
I think I can't use the entity key since it might be different between local and production datastores, so I would have to rename all my images after a production bulkupload.
There is a GAE tutorial on how to Serve Dynamic Images with Google App Engine. Includes explanations & downloadable source code.
I see two options here based on my very limited knowledge of GAE.
First, you can't actually write anything to the file system in GAE, right? That would mean that any images you want to include would have to be uploaded as a part of your webapp and would therefore have a static name and directory structure that is known and unchangeable. In this case, your idea of _main.jpg, OR /entity_key/main.jpg would work fine.
The second option is to store the images as a blob in the database. This may allow for uploading images dynamically rather than having to upload a new version of the webapp every time you need to update images. It would quickly eat into your free database space. Here's some information on serving pictures from the database. http://code.google.com/appengine/articles/images.html
If you're uploading the images statically, you can use the key based scheme if you want: Simply assign a key name to the entities, and use that to find the associated images. You specify a key name (in the Python version) as the key_name constructor argument to an entity class:
myentity = EntityClass(key_name="bleh", foo="bar", bar=123)
myentity.put()
and you can get the key name for an entity like so:
myentity.key().name()
It sounds like the datastore entities in question are basically static content, though, so perhaps you'd be better simply encoding them as literals in the source and thus having them in memory, or loading them at runtime from local datafiles, bypassing the need to query the datastore for them?

Resources