I'm using Java and JPA for ORM.
Initially I was defining entity keys like this:
#Id
#GeneratedValue(strategy = GenerationType.IDENTITY)
private Key key;
but that resulted in ids that were growing pretty fast and in unpredictable ways (...19,20,22,1003...1007,1014,1015,2004...)
which seems to contradict docs which state that "The simplest key field is a long integer value that is automatically populated by JPA with a value unique across all other instances of the class when the object is saved to the datastore for the first time. Long integer keys use a #Id annotation, and a #GeneratedValue(strategy = GenerationType.IDENTITY) annotation"
So I found this unit test and I switched to the way it was done there:
#Id
#GeneratedValue(strategy = GenerationType.SEQUENCE)
private Long id;
which migrated fine after updating some GQL statements, but I'm still seeing keys increasing by 1000 every time.
Should I be using GenerationType.TABLE? Or should I have been using IDENTITY on a Long rather than a Key field?
I'm hoping to get some definitive answers before I keep changing this in my live (beta) app. Unfortunately all schemes I've used so far in the dev env result in contiguous keys so really no way to test out new approaches except by deploying.
Thanks in advance.
It's really hard to do contiguous keys on App Engine. The docs never stated that the auto-generated keys are contiguous - only that they will be unique.
The simplest solution on app engine is to design your keys so that you don't need them to be contiguous. Given the way BigTable is designed, if you did have contiguously incrementing keys, you'd likely have some perf bottlenecks whenever a tablet needs to be split under the hood.
Related
We are using Google App Engine for our new app. We want to use Google’s Datastore we are trying to understand how Datastore indexes work.
We understood that there are a couple of limits on indexes. We are especially focusing on entity index limitations.
We have embedded property in one of our models ,
Main class
Contact
{
#Indexed
private String name;
#Embedded
#Indexed
private CStatus cstatus;
}
Embedded class
CStatus
{
private Long start_time = 0L;
public enum Status
{
ACTIVE, PAUSE, DELETED
};
private String status = null;
}
Assume that I saved an instance of Contact,
1.How many predefined indexes will be created for the Contact kind in total?
2.How many index entries will be created in total?
3.Is there any developers’ play ground available for Datastore? We have checked Datastore statistics but it's taking 24-48 hours to update the index entries list.
According to your code, two simple indexes will be created; 1 for name and another for status.
You should note that indexes will also be created if some other place in the code you run a query that requires other indexes.
Another thing to take note of is that the 200-limit on indexes does not apply to indexes using one single attribute. It applies to composite indexes using multiple attributes.
As of yet there is no play ground that I know of unless you wanna create a dummy project and test your code on it. Otherwise you just have to play in your development environment until Google addresses that issue.
We have an entity
#Entity
public class Cow
{
#Id private Long cowID;
#Index private int age;
#Index private long geoLoc;
private cowStuff cowData;
// getters, setters, etc
}
Using objectify, we filter for a range of ages and a single geoLoc (since we can't have multiple inequality filters). How many index entries are generated for each entity since the 2 indexes are int & long ?
Single property indexes and multiple property indexes are a little different.
Objectify uses #Id to create a single property index for those fields you annotated. Each field will result in one index entry (under the assumption that the type never changes - which in the case of Objectify is a safe assumption)
For some queries, appengine can leverage a combination of different single property indexes.
However, certain queries require a multiple property index - you can read more about that here.
For multiple property indexes, you have to add them yourself manually in datastore-indexes.xml.
The dev server will prompt you when you need a multiple property index, and make a suggestion in the form of an xml snippet.
Depends on the queries you run on your devserver. If you don't run any queries in your dev server and just push this as is, it will create 3 different indexes (default one(ID), one for age, one for geoLoc)... maybe 3 (not sure how the App Engine handles custom properties as columns honestly).
If you run queries on your entity, it will create more indexes to be able to serve those queries.
You can look inside your "index.yaml" if you're in python, or inside "WEB-INF/datastore-indexes.xml" if you're in java, to see the index your devserver thinks you can use.
More info... for Python or for Java
Is there any way to create a primary key that is only unique inside one specific kind (assuming I am asking the right question here! - apologies if not) I notice there is an "IdentityType.APPLICATION" option but "Application" seems to be the "smallest" available option!!
I have the following:
#PersistenceCapable(identityType = IdentityType.APPLICATION)
public class AuditTrail
{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Long ID;
#Persistent
private Date createDate;
#Persistent
private Long AdminID;
public AuditTrail()
{
this.createDate = new Date();
}
public AuditTrail(Long AdminID)
{
this();
this.setAdminID(AdminID);
}
}
But when I create a new entry, the ID is unique across all the items in my application, so that a Contact, an Admin, an Appointment, a Service etc are all separate "tables" (or kinds?) so its OK that the are all unique against each other, but the Audit Trail, could just have its own counting space, so that it doesn't interfere with the count of my "actual data"
Am I asking this in the right way, I have really tried to figure out this Entity/Kind/Property/Key thing, but I'm not sure I am fully understanding how it all actually works under the hood!
AppEngine is designed for high scalability and the lack of unique identifiers per Kind is one of the consequences. People often ask about similar related capability but it just is not efficient to provide. The Datastore is a NoSQL design built on BigTable which is described as a huge key-value store. It can retrieve the value for a key rapidly but considering that your many records are not necessarily on the same server it too much overhead to maintain an accurate count of a set of them (the Kind).
If you try to add the functionality robustly in your own code, you cannot avoid time consuming operations. Therefore your code will cause a high workload and delay or "latency" as some like to call it. Probably the AppEngine developers saw the same problems and opted for speed rather than developer friendliness.
There is nothing stopping you from maintaining your own counts in your application code, and even saving them in the Datastore. In some cases it is worth the delay. Always bear Brewer's CAP theorem (explanation) in mind.
Since upgrading to GAE 1.8, I'm getting scattered ids when annotating with #Id in Objectify:
#Id
private Long id;
Even though I understand the need for scattered ids in terms of avoiding hotspots on the cloud platform, is there a way in Objectify to get the old incremental ids back? Having to display a hexatridecimal value (like 1DZENH6BSOW) in the UI to avoid that massive generated 64bit id just doesn't cut it.
I'm happy to have a secondary annotation #IdLegacy working in conjunction with the #Id, then #Id will still generate the long id and I can use the legacy id for display purposes.
SOLUTION:
Inside my construtor, I have a simple piece of code that allocates an id if ones doesn't exist:
if (getId() == null){
ObjectifyFactory f = new ObjectifyFactory();
Key<MyEntity> key = f.allocateId(MyEntity.class);
setId(key.getId());
}
As far as I know, Objectify passes along the App Engine Datastore's scattered id behavior.
A quick check of the Objectify issue tracker doesn't show that anyone has yet made a request for incremental ids. Submit a request to the Objectify devs. http://code.google.com/p/objectify-appengine/issues/list
I am developing a RESTfull webservice with GAE. My technology stack is focused around Jersey, Spring, and Objectify.
If you don't know Objectify is ...
“Objectify is a Java data access API specifically designed for the Google App Engine datastore. It occupies a "middle ground"; easier to use and more transparent than JDO or JPA, but significantly more convenient than the Low-Level API. Objectify is designed to make novices immediately productive yet also expose the full power of the GAE datastore.”
https://code.google.com/p/objectify-appengine/
As of now I have used Objectify Keys to store relationships in my models. Like this ...
public class MyModel {
#Id private Long id;
private Key<MyOtherModel>> myOtherModel;
...
Objectify keys provide additional power as compared to Long IDs, but they can be created from a Long ID and a MyOtherModel.class with a static method Key.create(...),
Key.create(MyOtherModel.class, id)
so I don't exactly have to store relationships as Objectify keys at the model level, I just thought it be more consistent.
The problem is I need to write a lot of additional code to create XML adapters to convert the Objectify keys to Long IDs when I serialize my model objects to JSON, and deserialize them from JSON to a Java object.
I was thinking about using Long IDs instead and creating an Objectify Key in the DAO, when I need it. Also, this would remove any Objectify specific code from anything that wasn't a DAO.
I would like some perspective from a more experienced programmer. I have never created a software of this size, several thousand lines of code that is.
Thanks a lot everyone.
I am an in-experienced datastore/objectify developer too, so I'm just musing here.
I see your point, that replacing the Key<> type in MyModel with a Long id would simplify things for you. I would note though, that the Key<> object can contain a path (as well as a kind and an id). So, if your data model becomes more complex and MyOtherModel is no longer a root kind then your ability to generate a Key<> from a Long id breaks down.
If you know that won't happen, or don't mind changing MyModel later then I guess that isn't a problem.
For your serializing format I would suggest you use a String to hold your key or id. Your Long id can be converted to a string, and would have to be anyway for JSON (so there is no loss in efficiency), but that same string could later be used to hold the full Key too.
You can also store them as long (or Long or String) and have a method of getMyOtherModelKey() and that can return a key after calling the static method. You can also have getMyOtherModelID() to just return the ID. This really works both ways since you can have both methods if you store a key or just the ID.
The trick comes in if you use parents in any of your models. If you do the ID alone is not enough to get the other model, you need the ID and the IDs of all the parents (and grand parents if needed). This is where Keys are nice.