Visual understanding of the GAE datastore

Visual understanding of the GAE datastore - google-app-engine

I am trying to understand how the Google App Engine (GAE) datastore is designed and how to use it. I am having a bit of a hard time to visualise the structure from the description at the getting started page.
Can somebody explain the datastore using figures for us visually oriented people? Or point to a good tutorial again with visual learning in mind?
I am specifically looking for answers with diagrams/figures that explains how GAE is used.

The 2008 IO session "Under the Covers of the Google App Engine Datastore" has a good visual overview of the datastore.
https://sites.google.com/site/io/under-the-covers-of-the-google-app-engine-datastore
http://snarfed.org/datastore_talk.html
For more IO talks go to:
https://developers.google.com/appengine/docs/videoresources

Very simplified I've understood that GAE can be viewed as a hashmap of hashmaps.
That said you could view it like this:

I guess there's no correct answer here, just different mind models. Depending on your programming background you may find mine enlightning, disturbing or both. I picture the datastore as a single huge distributed key-value collection of buckets that comprises all entity data of any kind in any namespace and all GAE apps of all users. A single bucket is called an entity group. It has a root key which (under the hood) consists of your appID, a namespace, a kind, an entity ID or name. In an entity group resides one ore more entities which have keys extending the root key. The entity belonging to the root key itself may or may not exist. Operations within a single entity group are atomic (transactional). An entity is a simple map-like datastructure. The 2 built-in indexes (ascending and descending) again are 2 giant sorted collections of index entries. Each index entry is a datastructure of appID,namespace,kind,property name,property type,property value,entity key - in that order.
Each (auto-)indexed value of each property of each entity creates 2 such index entries. There's another index with just entity keys in it. Custom indexes however go to yet another sorted collection with entries containing appID,namespace,index type,combined index value, entity key. That's the only part of the whole datastore that uses meta-data. It stores an index definition which tells the store how the combined index value is formed from the entity. This is the picture that's burnt into my mind and from which I know how to make the datastore happy.

Related

Google Cloud Datastore unique autogenerated ids

I'm using Google Cloud Datastore and using namespaces to partition data. Some kinds are using autogenerated IDs from Cloud Datastore by creating keys like this:
var key = Datastore.key([
'example'
]);
This code will generate a key with kind 'example' and Cloud Datastore automatically assign the entity an integer numeric ID. (Source: https://cloud.google.com/datastore/docs/concepts/entities#kinds_and_identifiers)
But this "unique" ID is only unique for its namespace. I have seen the same ID in different namespaces.
So my question is, is it possible to tell Cloud Datastore that autogenerated IDs must be unique for all the namespaces?
Maybe this question has not sense, but I would prefer to have unique IDs in all the datastore (if possible).
I have seen "allocateIds" function in Cloud Datastore documentation, but I would like to know if this function take care about namespaces or not, because I've seen I can include them in the request and I'm afraid the IDs are the same than the ones autogenerated by Cloud Datastore.
Thank you in advance!

No: You can not tell Datastore to allocate unique IDs across all entity groups and namespaces.
However there is an easy fix: if you believe in statistics and correctly seeded random number generators you be generally better off if you generate your own GUIDs for keys.
If you don't believe in statistics and random numbers you can still generate a GUID and transactionally verify that it doesn't exist in your Datastore before writing the entity in question.
If you are truly desperate to have Datastore do id allocation for you it is possible to make a call AllocateIds manually and ask it to allocate an id for a constant key. (For example, ask it to allocate for an arbitrary (but unchanging) key in the default namespace, and it will return you an integer which will be unique to use somewhere else).

How to add a composite index in Google Datastore?

I'm now using Google Datastore for my company's database.
Today, I made a index and it successfully listed in 'Index'.
But the size and entities of index which I made is empty.
The documentation of google Datastore says that the index is auto-genarated, but it wasn't.
Is there any command or something to do to generate the index?
The image below is a screenshot.
The upper one is the new one. The below one is already used.

As a matter of fact existing entities will not be indexed automatically. You have to load and save all your old entities (without index) in order to have the necessary indexes created for these entities.
Note, however, that changing a property from unindexed to indexed does
not affect any existing entities that may have been created before the
change. Queries filtering on the property will not return such
existing entities, because the entities weren't written to the query's
index when they were created. To make the entities accessible by
future queries, you must rewrite them to the Datastore so that they
will be entered in the appropriate indexes. That is, you must do the
following for each such existing entity:
Retrieve (get) the entity from the Datastore. Write (put) the entity
back to the Datastore. Similarly, changing a property from indexed to
unindexed only affects entities subsequently written to the Datastore.
The index entries for any existing entities with that property will
continue to exist until the entities are updated or deleted. To avoid
unwanted results, you must purge your code of all queries that filter
or sort by the (now unindexed) property. (source)
Note that the documentation doesn't explicitly say the same for composed indexes. When you deploy a new composite index the index will appear in the developers console as "building" until it reaches "serving" state. Not sure what exactly it's building there, i usually re-saved all my entities and everything worked as it should.
auto-generated is a keyword that tells you whether you have manually created this index or whether it was created by the dev server when you made a query that required this index. This is in no way linked to how and when the indexes are created for the entities.
The <datastore-indexes> element has an autoGenerate attribute that
controls whether this file should be considered along with
automatically generated index configuration. See Using Automatic Index
Configuration below. (source)
When you created a new index and you want this index for all your existing entities I recommend you create a cursor query to handle this. Usually I expose this query in an admin backend and have the query run until there are no results anymore. Why expose the thing? If you have lots of entities this job may run longer than the allowed 60 seconds in the frontend or 10 minutes in the backend. By exposing this I can use the front end instance time and don't have to worry about the time restrictions.

Creating indexes on existing entity properties

When I started off with my project, I thought there was no need to create indexes on certain fields of entities but to generate certain daily reports, statistics we have a need to create indexes on some fields of existing entities.
As explained in the post Retroactive indexing in GAE Datastore, only way is to first change these properties from unindexed to indexed then retrieve and write all the entities again.
My question is if I take a back up from Datastore Admin and restore after changing the properties to indexed, will my project have all the required properties indexed? or do I need to retrieve and write through a program?
PS: My project is a java project on GAE

Edit: Work around I mentioned earlier does not work. The only way to change the field is to re-upload the entities. Sorry.

Search API, create documents and indexes

I need help in the search API
I am Brazilian and I'm using the google translator to communicate.
My question is:
For each item in the datastore persisted I create a document and an index?
And for those objects that are already persisted in the datastore, I go all the bank to create a document and an index for each, if I want to search for Search API?
I am using java.

It's reasonable to use the Search API to search for objects that are also stored in the Datastore. You can create a Search document for each Datastore entity (so that there's a one-to-one correspondence between them). But you don't need to use a separate Search Index for each one: all the Search documents can be added to one index. Or, if you have a huge number of documents, and if there is some natural partitioning between them, you could distribute them over some modest number of indexes. Assuming you can know via some external means which (single) index to choose for searching, preventing them from getting too big can help performance.
I've tried to answer the question that I think you're asking. It's difficult for me to understand the English that the Google translator has produced. In particular, what does "I go all the bank ..." mean?

Is there a nosql store that also allows for relationships between stored entities?

I am looking for nosql key value stores that also provide for storing/maintaining relationships between stored entities. I know Google App Engine's datastore allows for owned and unowned relationships between entities. Does any of the popular nosql store's provide something similar?
Even though most of them are schema less, are there methods to appropriate relationships onto a key value store?

It belongs to the core features of graph databases to provide support for relationships between entities. Typically, you model your entities as nodes and the relationships as relationships/edges in the graph. Unlike RDBMS you don't have to define relationships in advance -- just add them to the graph as needed (schema-free). I created a domain modeling gallery giving a few examples of how this can look in practice. The examples use the Neo4j graphdb, a project I'm involved in. The mailing list of this project use to prove very helpful for graph modeling questions.
The document-oriented database Riak has support for links between documents.
You can add support for relationships on top of any database engine (like key/value), but it doesn't come whithout work. It all comes down to your use case. If you provide more details it's easier to come up with a useful answer.
Oops, now I saw that the title says "nosql store" and then your actual question narrows this down to "nosql key value store". As key/value stores have no semantics for defining relationships between entities I'll still post my answer.

MongoDB is a document database, not a key/value store. It does provide, however, a simple form of inter-document references. These work more-or-less like SQL foreign keys that are automatically nulled when the referenced object is deleted.
This is adequate for the same sorts of things for which you'd use foreign keys, but it isn't optimized for serious graph traversal.

The relationships in the Google App Engine are only keys to entities that are automatically de-referenced when accessed in code. And are only values when used to filter against. Its a function of the DB Api rather than anything explicit, so the access to the ReferenceProperty will simply perform a query against the referenced model to get access to the object.
If you look at something like MongoDB, the relationships are stored in-object (from what I remeber), but they can also be stored however you want in the sense that you would create an API that would search the joined table for your item in the relationship in a similar manner to who the App Engine works.
Paul.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight