Creating indexes on existing entity properties - google-app-engine

When I started off with my project, I thought there was no need to create indexes on certain fields of entities but to generate certain daily reports, statistics we have a need to create indexes on some fields of existing entities.
As explained in the post Retroactive indexing in GAE Datastore, only way is to first change these properties from unindexed to indexed then retrieve and write all the entities again.
My question is if I take a back up from Datastore Admin and restore after changing the properties to indexed, will my project have all the required properties indexed? or do I need to retrieve and write through a program?
PS: My project is a java project on GAE

Edit: Work around I mentioned earlier does not work. The only way to change the field is to re-upload the entities. Sorry.

Related

How can I download all documents from Retrieve and Rank (Solr)?

We have a Cloudant database on Bluemix that contains a large number of documents that are answer units built by the Document Conversion service. These answer units are used to populate a Solr Retrieve and Rank collection for our application. The Cloudant database serves as our system of record for the answer units.
For reasons that are unimportant, our Cloudant database is no longer valid. What we need is a way to download everything from the Solr collection and re-create the Cloudant database. Can anyone tell me a way to do that?
I'm not aware of any automated way to do this.
You'll need to fetch all your documents from Solr (and assuming you have a lot of them, do this in a paginated way - there are some examples of how to do this in the Solr doc) and add them into Cloudant.
Note that you'll only be able to do this for the fields that you have set to be stored in your schema. If there are important fields that you need in Cloudant that you haven't got stored in Solr, then you might be stuck. :(
You can replicate one Cloudant database to another which will create you an exact replica.
Another technique is to use a tool such as couchbackup which takes a copy of your database's documents (ignoring any deletions) and allows you to save the data in a text file. You can then use the couchrestore tool to upload the data file to a new database.
See this blog for more details.

Updating schema for a IBM Watson Retrieve and Rank Config

Are there ways to update the schema of the Solr config in IBM Watson's Retrieve and Rank service other than deleting, then uploading the config again.
I used the following example to create a new cluster, config and collection.
https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/doc/retrieve-rank/get_start.shtml
I started from the blank example config and updated the schema.
I now need to update the schema and add/modify some schema elements. Is there a way to do it without deleting and uploading the config again? How can this be done so that there is minimum downtime when making the change?
You can do this but you have to configure Solr to use managed schemas: https://cwiki.apache.org/confluence/display/solr/Managed+Schema+Definition+in+SolrConfig and then the schema APIs: https://cwiki.apache.org/confluence/display/solr/Schema+API.
Do note, however, the big caveat on the schema API page:
Re-index after schema modifications!
If you modify your schema, you will likely need to re-index all documents. If you do not, you may lose access to documents, or not be able to interpret them properly, e.g. after replacing a field type.
Modifying your schema will never modify any documents that are already indexed. Again, you must re-index documents in order to apply schema changes to them.
So it will depend on what specific schema changes you need as to whether or not you need to re-index.. If you're adding a new field, no problems... if you're modifying an existing field, this will only impact data you have not indexed yet and it might mean you should re-index (depending on your changes), etc.

How to add a composite index in Google Datastore?

I'm now using Google Datastore for my company's database.
Today, I made a index and it successfully listed in 'Index'.
But the size and entities of index which I made is empty.
The documentation of google Datastore says that the index is auto-genarated, but it wasn't.
Is there any command or something to do to generate the index?
The image below is a screenshot.
The upper one is the new one. The below one is already used.
As a matter of fact existing entities will not be indexed automatically. You have to load and save all your old entities (without index) in order to have the necessary indexes created for these entities.
Note, however, that changing a property from unindexed to indexed does
not affect any existing entities that may have been created before the
change. Queries filtering on the property will not return such
existing entities, because the entities weren't written to the query's
index when they were created. To make the entities accessible by
future queries, you must rewrite them to the Datastore so that they
will be entered in the appropriate indexes. That is, you must do the
following for each such existing entity:
Retrieve (get) the entity from the Datastore. Write (put) the entity
back to the Datastore. Similarly, changing a property from indexed to
unindexed only affects entities subsequently written to the Datastore.
The index entries for any existing entities with that property will
continue to exist until the entities are updated or deleted. To avoid
unwanted results, you must purge your code of all queries that filter
or sort by the (now unindexed) property. (source)
Note that the documentation doesn't explicitly say the same for composed indexes. When you deploy a new composite index the index will appear in the developers console as "building" until it reaches "serving" state. Not sure what exactly it's building there, i usually re-saved all my entities and everything worked as it should.
auto-generated is a keyword that tells you whether you have manually created this index or whether it was created by the dev server when you made a query that required this index. This is in no way linked to how and when the indexes are created for the entities.
The <datastore-indexes> element has an autoGenerate attribute that
controls whether this file should be considered along with
automatically generated index configuration. See Using Automatic Index
Configuration below. (source)
When you created a new index and you want this index for all your existing entities I recommend you create a cursor query to handle this. Usually I expose this query in an admin backend and have the query run until there are no results anymore. Why expose the thing? If you have lots of entities this job may run longer than the allowed 60 seconds in the frontend or 10 minutes in the backend. By exposing this I can use the front end instance time and don't have to worry about the time restrictions.

Google Cloud Datastore: is it possible to "visualize" data entities in UML style?

My collegues have created a series of entities using "Google Cloud Datastore".
What I would like to achieve now is to generate a data schema from the set of entities we have got. Something like this.
It does not necessarily have to include the 1:1, 1:many, n:n archs but a UML style data structure generated for each entity would be already a good start.
The challenge is that:
when clicking on a "record" when colums are empty a data type does not show
some colum fields are "objects" (which can be complex JSON objects, not sure if I would prefer to model them as a separate entity and link them to it or to leave the word "object")
referencing between record is done by the developer and I doubt that there is a tool clever enough to understand this. Hence I do not expect to have also hte n:n relations shown.
Is there a project or a tool or a methodology to create this schema starting from an existing "Google Cloud Datastore"?

Visual understanding of the GAE datastore

I am trying to understand how the Google App Engine (GAE) datastore is designed and how to use it. I am having a bit of a hard time to visualise the structure from the description at the getting started page.
Can somebody explain the datastore using figures for us visually oriented people? Or point to a good tutorial again with visual learning in mind?
I am specifically looking for answers with diagrams/figures that explains how GAE is used.
The 2008 IO session "Under the Covers of the Google App Engine Datastore" has a good visual overview of the datastore.
https://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore
http://snarfed.org/datastore_talk.html
For more IO talks go to:
https://developers.google.com/appengine/docs/videoresources
Very simplified I've understood that GAE can be viewed as a hashmap of hashmaps.
That said you could view it like this:
I guess there's no correct answer here, just different mind models. Depending on your programming background you may find mine enlightning, disturbing or both. I picture the datastore as a single huge distributed key-value collection of buckets that comprises all entity data of any kind in any namespace and all GAE apps of all users. A single bucket is called an entity group. It has a root key which (under the hood) consists of your appID, a namespace, a kind, an entity ID or name. In an entity group resides one ore more entities which have keys extending the root key. The entity belonging to the root key itself may or may not exist. Operations within a single entity group are atomic (transactional). An entity is a simple map-like datastructure. The 2 built-in indexes (ascending and descending) again are 2 giant sorted collections of index entries. Each index entry is a datastructure of appID,namespace,kind,property name,property type,property value,entity key - in that order.
Each (auto-)indexed value of each property of each entity creates 2 such index entries. There's another index with just entity keys in it. Custom indexes however go to yet another sorted collection with entries containing appID,namespace,index type,combined index value, entity key. That's the only part of the whole datastore that uses meta-data. It stores an index definition which tells the store how the combined index value is formed from the entity. This is the picture that's burnt into my mind and from which I know how to make the datastore happy.

Resources