Google Cloud Datastore indexes - google-app-engine

We are using Google App Engine for our new app. We want to use Google’s Datastore we are trying to understand how Datastore indexes work.
We understood that there are a couple of limits on indexes. We are especially focusing on entity index limitations.
We have embedded property in one of our models ,
Main class
Contact
{
#Indexed
private String name;
#Embedded
#Indexed
private CStatus cstatus;
}
Embedded class
CStatus
{
private Long start_time = 0L;
public enum Status
{
ACTIVE, PAUSE, DELETED
};
private String status = null;
}
Assume that I saved an instance of Contact,
1.How many predefined indexes will be created for the Contact kind in total?
2.How many index entries will be created in total?
3.Is there any developers’ play ground available for Datastore? We have checked Datastore statistics but it's taking 24-48 hours to update the index entries list.

According to your code, two simple indexes will be created; 1 for name and another for status.
You should note that indexes will also be created if some other place in the code you run a query that requires other indexes.
Another thing to take note of is that the 200-limit on indexes does not apply to indexes using one single attribute. It applies to composite indexes using multiple attributes.
As of yet there is no play ground that I know of unless you wanna create a dummy project and test your code on it. Otherwise you just have to play in your development environment until Google addresses that issue.

Related

Selecting Objectify data from the Google DataStore using GSQL in the Developer Console

I have an Objectify entity called UserEntity which contains an object called user. I want to dump all the last sync times and some other data from my user objects to do a bit of analysis. I'm trying to do this in the Developers Console using GSQL but can't work out how to get the results I want.
The query below works to get everything
SELECT * FROM UserEntity
Using this query get's all the keys
SELECT __key__ FROM UserEntity
This returns nothing, saying No data was found.
SELECT user FROM UserEntity
But I can't work out how to (or if I can) select individual properties from objects. Is it possible to achieve this in the Developer Console, or shall I just write some code to do it?
Ideally I'd like to be able to do something like
SELECT user.synctime, user.currentLevel FROM UserEntity
Stripped UserEntity class below
#Entity
#Cache
public class UserEntity extends WordBuzzEntity {
#Id
private String facebookId;
public User user = new User(null);
private HashMap<String, Date> accessTokens = new HashMap<String, Date>();
}
This is not how the datastore fundamentally works. The datastore is a key/value store with some extra indexing. The values are serialized protobufs. Generally speaking, you load entities whole and cannot pick/choose the parts you want.
There is some extra cleverness that the datastore can perform, selecting data directly out of an index rather than loading the protobuf value. The most obvious is a keys-only query (the key is always part of every index). More sophisticated is a "projection" query which looks like SQL select at first glance, but really is quite a different animal and requires you to maintain special indexes. However, that is an advanced performance optimization that you should not pursue unless you really know what you are doing. Start with the simple model of loading whole entities.

Google App Engine Index entires for ints and Longs

We have an entity
#Entity
public class Cow
{
#Id private Long cowID;
#Index private int age;
#Index private long geoLoc;
private cowStuff cowData;
// getters, setters, etc
}
Using objectify, we filter for a range of ages and a single geoLoc (since we can't have multiple inequality filters). How many index entries are generated for each entity since the 2 indexes are int & long ?
Single property indexes and multiple property indexes are a little different.
Objectify uses #Id to create a single property index for those fields you annotated. Each field will result in one index entry (under the assumption that the type never changes - which in the case of Objectify is a safe assumption)
For some queries, appengine can leverage a combination of different single property indexes.
However, certain queries require a multiple property index - you can read more about that here.
For multiple property indexes, you have to add them yourself manually in datastore-indexes.xml.
The dev server will prompt you when you need a multiple property index, and make a suggestion in the form of an xml snippet.
Depends on the queries you run on your devserver. If you don't run any queries in your dev server and just push this as is, it will create 3 different indexes (default one(ID), one for age, one for geoLoc)... maybe 3 (not sure how the App Engine handles custom properties as columns honestly).
If you run queries on your entity, it will create more indexes to be able to serve those queries.
You can look inside your "index.yaml" if you're in python, or inside "WEB-INF/datastore-indexes.xml" if you're in java, to see the index your devserver thinks you can use.
More info... for Python or for Java

IdGeneratorStrategy unique for each kind

Is there any way to create a primary key that is only unique inside one specific kind (assuming I am asking the right question here! - apologies if not) I notice there is an "IdentityType.APPLICATION" option but "Application" seems to be the "smallest" available option!!
I have the following:
#PersistenceCapable(identityType = IdentityType.APPLICATION)
public class AuditTrail
{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long ID;
#Persistent
private Date createDate;
#Persistent
private Long AdminID;
public AuditTrail()
{
this.createDate = new Date();
}
public AuditTrail(Long AdminID)
{
this();
this.setAdminID(AdminID);
}
}
But when I create a new entry, the ID is unique across all the items in my application, so that a Contact, an Admin, an Appointment, a Service etc are all separate "tables" (or kinds?) so its OK that the are all unique against each other, but the Audit Trail, could just have its own counting space, so that it doesn't interfere with the count of my "actual data"
Am I asking this in the right way, I have really tried to figure out this Entity/Kind/Property/Key thing, but I'm not sure I am fully understanding how it all actually works under the hood!
AppEngine is designed for high scalability and the lack of unique identifiers per Kind is one of the consequences. People often ask about similar related capability but it just is not efficient to provide. The Datastore is a NoSQL design built on BigTable which is described as a huge key-value store. It can retrieve the value for a key rapidly but considering that your many records are not necessarily on the same server it too much overhead to maintain an accurate count of a set of them (the Kind).
If you try to add the functionality robustly in your own code, you cannot avoid time consuming operations. Therefore your code will cause a high workload and delay or "latency" as some like to call it. Probably the AppEngine developers saw the same problems and opted for speed rather than developer friendliness.
There is nothing stopping you from maintaining your own counts in your application code, and even saving them in the Datastore. In some cases it is worth the delay. Always bear Brewer's CAP theorem (explanation) in mind.

Storing Relationships as Objectify Keys vs. Long IDs

I am developing a RESTfull webservice with GAE. My technology stack is focused around Jersey, Spring, and Objectify.
If you don't know Objectify is ...
“Objectify is a Java data access API specifically designed for the Google App Engine datastore. It occupies a "middle ground"; easier to use and more transparent than JDO or JPA, but significantly more convenient than the Low-Level API. Objectify is designed to make novices immediately productive yet also expose the full power of the GAE datastore.”
https://code.google.com/p/objectify-appengine/
As of now I have used Objectify Keys to store relationships in my models. Like this ...
public class MyModel {
#Id private Long id;
private Key<MyOtherModel>> myOtherModel;
...
Objectify keys provide additional power as compared to Long IDs, but they can be created from a Long ID and a MyOtherModel.class with a static method Key.create(...),
Key.create(MyOtherModel.class, id)
so I don't exactly have to store relationships as Objectify keys at the model level, I just thought it be more consistent.
The problem is I need to write a lot of additional code to create XML adapters to convert the Objectify keys to Long IDs when I serialize my model objects to JSON, and deserialize them from JSON to a Java object.
I was thinking about using Long IDs instead and creating an Objectify Key in the DAO, when I need it. Also, this would remove any Objectify specific code from anything that wasn't a DAO.
I would like some perspective from a more experienced programmer. I have never created a software of this size, several thousand lines of code that is.
Thanks a lot everyone.
I am an in-experienced datastore/objectify developer too, so I'm just musing here.
I see your point, that replacing the Key<> type in MyModel with a Long id would simplify things for you. I would note though, that the Key<> object can contain a path (as well as a kind and an id). So, if your data model becomes more complex and MyOtherModel is no longer a root kind then your ability to generate a Key<> from a Long id breaks down.
If you know that won't happen, or don't mind changing MyModel later then I guess that isn't a problem.
For your serializing format I would suggest you use a String to hold your key or id. Your Long id can be converted to a string, and would have to be anyway for JSON (so there is no loss in efficiency), but that same string could later be used to hold the full Key too.
You can also store them as long (or Long or String) and have a method of getMyOtherModelKey() and that can return a key after calling the static method. You can also have getMyOtherModelID() to just return the ID. This really works both ways since you can have both methods if you store a key or just the ID.
The trick comes in if you use parents in any of your models. If you do the ID alone is not enough to get the other model, you need the ID and the IDs of all the parents (and grand parents if needed). This is where Keys are nice.

Clarify the usage of AppEngine Search API

I have started to try out to use the new Search API, the demo is running smoothly, however, there are some points I am still confused about being outsider of the search world.
First of all is how to build a document. Obviously you can't hard-coded each line into a document, but what else can I do. Say if I have a user class (I'm using Java, but I guess Python makes no difference here), and I would add the user's information into the document, and be able to do a full-text search against the field of address.
class User {
String username;
String password;
String address;
}
In my datastore, I have this entity with 10000 instances there, and if I will need to build this document, do I have to
Step 1: retrieve the 10000 instance from datastore
Step 2: Iterate through each of the user entity, and create 10000 documents
Step 3: Add all 10000 docs into an index, and then I will be able to search
Please correct me if above three steps I mentioned is wrong.
If that is the case, then does it that later each time a new User registered, we need to create a new document, and add to the index?
Unfortunately I haven't play around with that much. I learned a few things.
When first implementing it, I hade to create a lot of documents as well (as you describe). But kept running in to deadline exceptions. So I ended upp using the task queue for building documents for all my old records.
Remember to create a cross-reference between the search Document and you datastore entity. So you can easily update your document record. And from a search result get the match entity.
For cross-reference add a new property on your datastore model called something like search_document_id where you store the doc_id (I prefixed all my doc_id's with the datastore model name). And add a text field on you Document containing the entity key as a string.
But I would say in a nutshell you are correct.

Resources