Google App Engine Index entires for ints and Longs - google-app-engine

We have an entity
#Entity
public class Cow
{
#Id private Long cowID;
#Index private int age;
#Index private long geoLoc;
private cowStuff cowData;
// getters, setters, etc
}
Using objectify, we filter for a range of ages and a single geoLoc (since we can't have multiple inequality filters). How many index entries are generated for each entity since the 2 indexes are int & long ?

Single property indexes and multiple property indexes are a little different.
Objectify uses #Id to create a single property index for those fields you annotated. Each field will result in one index entry (under the assumption that the type never changes - which in the case of Objectify is a safe assumption)
For some queries, appengine can leverage a combination of different single property indexes.
However, certain queries require a multiple property index - you can read more about that here.
For multiple property indexes, you have to add them yourself manually in datastore-indexes.xml.
The dev server will prompt you when you need a multiple property index, and make a suggestion in the form of an xml snippet.

Depends on the queries you run on your devserver. If you don't run any queries in your dev server and just push this as is, it will create 3 different indexes (default one(ID), one for age, one for geoLoc)... maybe 3 (not sure how the App Engine handles custom properties as columns honestly).
If you run queries on your entity, it will create more indexes to be able to serve those queries.
You can look inside your "index.yaml" if you're in python, or inside "WEB-INF/datastore-indexes.xml" if you're in java, to see the index your devserver thinks you can use.
More info... for Python or for Java

Related

Google Cloud Datastore indexes

We are using Google App Engine for our new app. We want to use Google’s Datastore we are trying to understand how Datastore indexes work.
We understood that there are a couple of limits on indexes. We are especially focusing on entity index limitations.
We have embedded property in one of our models ,
Main class
Contact
{
#Indexed
private String name;
#Embedded
#Indexed
private CStatus cstatus;
}
Embedded class
CStatus
{
private Long start_time = 0L;
public enum Status
{
ACTIVE, PAUSE, DELETED
};
private String status = null;
}
Assume that I saved an instance of Contact,
1.How many predefined indexes will be created for the Contact kind in total?
2.How many index entries will be created in total?
3.Is there any developers’ play ground available for Datastore? We have checked Datastore statistics but it's taking 24-48 hours to update the index entries list.
According to your code, two simple indexes will be created; 1 for name and another for status.
You should note that indexes will also be created if some other place in the code you run a query that requires other indexes.
Another thing to take note of is that the 200-limit on indexes does not apply to indexes using one single attribute. It applies to composite indexes using multiple attributes.
As of yet there is no play ground that I know of unless you wanna create a dummy project and test your code on it. Otherwise you just have to play in your development environment until Google addresses that issue.

Selecting Objectify data from the Google DataStore using GSQL in the Developer Console

I have an Objectify entity called UserEntity which contains an object called user. I want to dump all the last sync times and some other data from my user objects to do a bit of analysis. I'm trying to do this in the Developers Console using GSQL but can't work out how to get the results I want.
The query below works to get everything
SELECT * FROM UserEntity
Using this query get's all the keys
SELECT __key__ FROM UserEntity
This returns nothing, saying No data was found.
SELECT user FROM UserEntity
But I can't work out how to (or if I can) select individual properties from objects. Is it possible to achieve this in the Developer Console, or shall I just write some code to do it?
Ideally I'd like to be able to do something like
SELECT user.synctime, user.currentLevel FROM UserEntity
Stripped UserEntity class below
#Entity
#Cache
public class UserEntity extends WordBuzzEntity {
#Id
private String facebookId;
public User user = new User(null);
private HashMap<String, Date> accessTokens = new HashMap<String, Date>();
}
This is not how the datastore fundamentally works. The datastore is a key/value store with some extra indexing. The values are serialized protobufs. Generally speaking, you load entities whole and cannot pick/choose the parts you want.
There is some extra cleverness that the datastore can perform, selecting data directly out of an index rather than loading the protobuf value. The most obvious is a keys-only query (the key is always part of every index). More sophisticated is a "projection" query which looks like SQL select at first glance, but really is quite a different animal and requires you to maintain special indexes. However, that is an advanced performance optimization that you should not pursue unless you really know what you are doing. Start with the simple model of loading whole entities.

Storing Relationships as Objectify Keys vs. Long IDs

I am developing a RESTfull webservice with GAE. My technology stack is focused around Jersey, Spring, and Objectify.
If you don't know Objectify is ...
“Objectify is a Java data access API specifically designed for the Google App Engine datastore. It occupies a "middle ground"; easier to use and more transparent than JDO or JPA, but significantly more convenient than the Low-Level API. Objectify is designed to make novices immediately productive yet also expose the full power of the GAE datastore.”
https://code.google.com/p/objectify-appengine/
As of now I have used Objectify Keys to store relationships in my models. Like this ...
public class MyModel {
#Id private Long id;
private Key<MyOtherModel>> myOtherModel;
...
Objectify keys provide additional power as compared to Long IDs, but they can be created from a Long ID and a MyOtherModel.class with a static method Key.create(...),
Key.create(MyOtherModel.class, id)
so I don't exactly have to store relationships as Objectify keys at the model level, I just thought it be more consistent.
The problem is I need to write a lot of additional code to create XML adapters to convert the Objectify keys to Long IDs when I serialize my model objects to JSON, and deserialize them from JSON to a Java object.
I was thinking about using Long IDs instead and creating an Objectify Key in the DAO, when I need it. Also, this would remove any Objectify specific code from anything that wasn't a DAO.
I would like some perspective from a more experienced programmer. I have never created a software of this size, several thousand lines of code that is.
Thanks a lot everyone.
I am an in-experienced datastore/objectify developer too, so I'm just musing here.
I see your point, that replacing the Key<> type in MyModel with a Long id would simplify things for you. I would note though, that the Key<> object can contain a path (as well as a kind and an id). So, if your data model becomes more complex and MyOtherModel is no longer a root kind then your ability to generate a Key<> from a Long id breaks down.
If you know that won't happen, or don't mind changing MyModel later then I guess that isn't a problem.
For your serializing format I would suggest you use a String to hold your key or id. Your Long id can be converted to a string, and would have to be anyway for JSON (so there is no loss in efficiency), but that same string could later be used to hold the full Key too.
You can also store them as long (or Long or String) and have a method of getMyOtherModelKey() and that can return a key after calling the static method. You can also have getMyOtherModelID() to just return the ID. This really works both ways since you can have both methods if you store a key or just the ID.
The trick comes in if you use parents in any of your models. If you do the ID alone is not enough to get the other model, you need the ID and the IDs of all the parents (and grand parents if needed). This is where Keys are nice.

Can't get generated keys to increase by 1

I'm using Java and JPA for ORM.
Initially I was defining entity keys like this:
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
but that resulted in ids that were growing pretty fast and in unpredictable ways (...19,20,22,1003...1007,1014,1015,2004...)
which seems to contradict docs which state that "The simplest key field is a long integer value that is automatically populated by JPA with a value unique across all other instances of the class when the object is saved to the datastore for the first time. Long integer keys use a #Id annotation, and a #GeneratedValue(strategy = GenerationType.IDENTITY) annotation"
So I found this unit test and I switched to the way it was done there:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
which migrated fine after updating some GQL statements, but I'm still seeing keys increasing by 1000 every time.
Should I be using GenerationType.TABLE? Or should I have been using IDENTITY on a Long rather than a Key field?
I'm hoping to get some definitive answers before I keep changing this in my live (beta) app. Unfortunately all schemes I've used so far in the dev env result in contiguous keys so really no way to test out new approaches except by deploying.
Thanks in advance.
It's really hard to do contiguous keys on App Engine. The docs never stated that the auto-generated keys are contiguous - only that they will be unique.
The simplest solution on app engine is to design your keys so that you don't need them to be contiguous. Given the way BigTable is designed, if you did have contiguously incrementing keys, you'd likely have some perf bottlenecks whenever a tablet needs to be split under the hood.

How can I discover if a property of a stored Entity is indexed or unindexed?

I have several entities in datastore, but I don't know if some of their properties are indexed or unindexed.
How can I discover (with admin console or programatically) if a property of a stored Entity is indexed or unindexed?
By default each entity is indexed (unless its TextProperty or BlobProperty), you need (and should) set the property indexed property to False if you don't want it to be indexed (to improve performance and entity writing costs).
There is no indication in the admin console on if a property is indexed or not, You can try to execute "select * from EntityType order by Property" in the GQL of the datastore views and see if it fails.
If you've been flipping between indexed=True and indexed=False on some properties over time, and have a set of entities written under both regimes, then you'll have some properties that are indexed and some that aren't. Is this the situation you're in?
If you don't have reliable history on your code, trying to determine if you're in this situation is a bit tricky, depending on how many entities you have. You can determine if you're in an inconsistent state by noting if a keys-only query on an Entity returns a different number of keys than a query that filters on the suspect property. A filter won't find unindexed properties. If you've got a lot of entities, you'll have to shard the counting somehow (to avoid timing out on a long query that returns lots of entities).
If you determine that you do have inconsistent indexing and want to repair your entities to be consistent, the usual approach is to write a mapreduce that touches all of your unstable entities and issues puts on the necessary properties.
Take a look at "Datastore Indexes" interface, link for which is located on the left navigation menu in app engine dashboard.
There you'll see list of indexes and the specific properties on which an index has been applied.
For composite indexes (i.e. the one defined in datastore-indexes.xml or index.yaml), you could use the low-level API to get the list of indexes that are present in your app's datastore.
In GAE/J, you would need to invoke DatastoreServiceFactory.getDatastoreService().getIndexes(), while in Python, the same function is provided by db.get_indexes().

Resources