Google AppEngine Sharding Question - google-app-engine

My background is in relational DB's and I am doing some experimenting with Google AppEngine primarily for learning. I want to build an "election" app where a user belongs to a state (CA, NY, TX, etc), they pick a party (Republican, Democratic, etc) and cast a vote for a particular year (2012 for now but the app could be reused in 2016).
I want a user to be able to see their voting history and maybe change it once for the current election. Also, I am going to require that users specify their zip code and think it would be nice to run some reports by state and/or zip code.
Using a relational DB, it seems you would create some tables like this:
Users(userid, username, city, state, zip)
UserVote(userid, year, vote)
And then use SQL to run reports. With the AppEngine datastore it seems that running aggregate reports is somewhat of a challenge.
My initial take would be to shard by User where each user can contain a list of Votes and then maybe double-save the aggregates elsewhere.
Any suggestions?
P.S. I have seen the AppEngine-MapReduce project, but am not sure if that would be overkill.

I dont remember exactly where I read this, but List properties in GAE become slow after they reach about 200 items. I would recommend against this in favor of the foreign key approach for Users and Votes.
Aggregates are a challenge since there are none of the common helper functions such as MAX, SUM, COUNT and so on. The best approach would be to store aggregates and counts in a separate datatype which you can query easily and update that every time a user makes a vote.
Its easier in AppEngine to spend the time when you do the write so you can have faster queries later.
Here's a example of the objects in Java:
#PersistenceCapable
public class User{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
...
}
#PersistenceCapable
public class Vote{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
#Persistent
private Key userKey; // References a User
...
}
#PersistenceCapable
public class UserStats{
#PrimaryKey
#Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY)
private Key key;
#Persistent
private Key userKey; // References a User
...
}
Also, traditional sharding doesn't make much sense in AppEngine since the underlying datastore is designed to handle queries on massive data sets with ease. The exception is if you have a specific counter that can be changed frequently and has a potential for multiple users changing it at the same time. This is a different type of sharding than you're used to in MySQL. Here is Google's article on sharding counters: http://code.google.com/appengine/articles/sharding_counters.html

Related

When do you store ids and when do you store keys in gae datastore?

Suppose my datastore model looks like this:
#Entity
public class User {
#Id
private Long id;
#Index
private String email;
private Long dateOfBirth;
// More fields...
}
#Entity
public class Topic {
#Id
private Long id;
private String topicTitle;
private Long date;
}
#Entity
public class Comment {
#Id
private Long id;
#Parent
private Key<Topic> topicKey;
private Long commenterId;
private String text;
private Long date;
}
Where the entity Comment has a parent entity Topic. I know one should store keys when specifying the #Parent such as I did in the Comment entity, but should one also store the key of the commenterId? Or is storing the Long id of that User enough?
Just wondering what the best practice is for storing references to other entities when they are not parents - should you store the id and generate the key later or just store the key to the entity. Is there a good reason why you might do one over the other?
EDIT: Since I am using Cloud Endpoints, the responses I get from my AppEngine project are JSON. Parameterized type of Key not allowed in the client libs. So for me, id can work and also Key<?> can work. Just note that you should return a websafe version of to your client using:
myKey.getString();
Typically there is no reason to store a key as a reference. Keys take much more space - both in the datastore, and in objects that you transfer to and from the client.
Using a key may be necessary only if the same entity kind can be either by itself or a child of another entity. It is technically possible, and some data models can use this approach, although it is probably a very rare use case.
NB: I only use ID of a parent in objects - for the same reason (less space). In datastore entities parent ID can always be extracted from a child entity key. I use low-level Datastore API, however - you need to check how to correctly annotate child-parent relationship in the library that you use.

Different Tables or different Databases for Business Data and Identity

I read through Apress - Pro Asp.Net MVC 5 and the free chapters of the Identity Framework and now, I want to create a small sample application with some data and Identity.
Later I want to make a test deployment to Windows Azure.
Now, should I create one single Database for this Application, containing all Data (Products, whatsoever, IdentityData (User-Accounts, Oauth Linkings...)) or would it be better to create two Databases?
I know, If I'd create two, I would be able to use the same Identity-Data for other MVC Applications, but is there some kind of best practice for MVC?
There's no "best practice", per se, in this area. It depends on the needs of your individual application. What I can tell you is that if you choose to use multiple database, you'll end up with a somewhat fractured application. That sounds like a bad thing, but remember this is a valid choice in some scenarios. What I mean by that is simply that if you were to separate Identity from the rest of your application, requiring two databases and two contexts, there's no way, then, to relate your ApplicationUser with any other object in your application.
For example, let's say you creating a reviews site. Review would be a class in your application context, and ApplicationUser would of course be a class in your Identity context. You could never do something like:
public class Review
{
...
public virtual ApplicationUser ReviewedBy { get; set; }
}
That would typically result in a foreign key being created on the reviews table, pointing to a row in your users table. However, since these two tables are in separate databases, that's not possible. In fact, if you were to do something like this, Entity Framework would realize this problem, and actually attach ApplicationUser to your application context and attempt to generate a table for it in your application's database.
What you could do, though, is simply store the id of the user:
public string ReviewedById { get; set; }
But, again, this wouldn't be a foreign key. If you needed the user instance, you'd have to perform a two step process:
var review = appContext.Reviews.Find(reviewId);
var user = indentityContext.Users.Find(review.ReviewedById);
Generally speaking, it's better to keep all your application data together, including things like Identity. However, if you can't, or have a business case that precludes that, you can still do pretty much anything you need to do, it just becomes a bit more arduous and results in more queries.

hibernate relationship mapping

So if I have users and address tables. Because each users can buy or sell things so in the users table should have return_address, shipping_address, and billing_address. Then how can I define the relationship of these two tables?
Many-to-Many: because multiples users have multiple addresses (different type of address).
One-to-One: because each user only have one address.
One-to-Many: many users can share one address.
After reading some tutorials about hibernate, I find myself so confusing because now it has Many-to-One (I understand that it is a reversion of One-to-Many, but still confusing) .
Would any one mind to give me some advises and suggestions designing the database as well as optimization the database/query performance? It would be wonderful to help a new learner like me reduce the headache.
Thank you in advance.
User table will hold the address key.
#Entity
public class Customer{
#ManyToOne
#JoinColumn(name = "return_address_fk")
private Address returnAddress;
#ManyToOne
#JoinColumn(name = "shipping_address_fk")
private Address shippingAddress;
#ManyToOne
#JoinColumn(name = "billing_address_fk")
private Address billingAddress;
}
#Entity
public class Address{
// address properties
}
** Do not get confused.
Address should be a referred entity, means it's lifecycle does not depend upon User.
1.) Many users can have same returnAddress.
2.) Many users can have same shippingAddress.
3.) Many users can have same billingAddress.
If you say, One user can have Many returnAddress, shippingAddress, billingAddress, then add Many To Many annotation. Separate table will be created.

JDO app engine: composite key

Is this the right way to define a composite key for
a class:
#PersistenceCapable
class Item {
#PrimaryKey
long id;
#PrimaryKey
String sellerID;
// ... other fields follow
}
because I want the pair (id, sellerID) to be unique, not just id on its own.
Thus in the app engine datastore I need an entity which incorporates both
fields somehow into a key (for instance separating them with a dash and
concatenating them) but I am not sure about how to go about instructing
app engine to do so via JDO or even via the low-level API.
The easiest way here is to use KeyFactory and to use a single Key that you generate each time:
http://code.google.com/appengine/docs/java/javadoc/com/google/appengine/api/datastore/KeyFactory.Builder.html
Create a String Key and concatenate the two fields. Creating two #PrimaryKey annotations will not work - treat App Engine as close to a key-value store as possible. I really like Jeff Schnitzer's explanation here about how to think of the datastore as a HashMap/Dictionary:
http://code.google.com/p/objectify-appengine/wiki/Concepts

de-normalizing data model: django/sql -> app engine

I'm just starting to get my head around non-relational databases, so I'd like to ask some help with converting these traditional SQL/django models into Google App Engine model(s).
The example is for event listings, where each event has a category, belongs to a venue, and a venue has a number of photos attached to it.
In django, I would model the data like this:
class Event(models.Model)
title = models.CharField()
start = models.DatetimeField()
category = models.ForeignKey(Category)
venue = models.ForeignKey(Venue)
class Category(models.Model):
name= models.CharField()
class Venue (models.Model):
name = models.CharField()
address = models.CharField()
class Photo(models.Model):
venue = models.ForeignKey(Venue)
source = models.CharField()
How would I accomplish the equivalent with App Engine models?
There's nothing here that must be de-normalized to work with App Engine. You can change ForeignKey to ReferenceProperty, CharField to StringProperty and DatetimeField to DateTimeProperty and be done. It might be more efficient to store category as a string rather than a reference, but this depends on usage context.
Denormalization becomes important when you start designing queries. Unlike traditional SQL, you can't write ad-hoc queries that have access to every row of every table. Anything you want to query for must be satisfied by an index. If you're running queries today that depend on table scans and complex joins, you'll have to make sure that the query parameters are indexed at write-time instead of calculating them on the fly.
As an example, if you wanted to do a case-insensitive search by event title, you'd have to store a lower-case copy of the title on every entity at write time. Without guessing your query requirements, I can't really offer more specific advice.
It's possible to run Django on App Engine
You need a trio of apps from here:
http://www.allbuttonspressed.com/projects
Django-nonrel
djangoappengine
djangotoolbox
Additionally, this module makes it possible to do the joins across Foreign Key relationships which are not directly supported by datastore methods:
django-dbindexer
...it denormalises the fields you want to join against, but has some limitations - doesn't update the denormalised values automatically so is only really suitable for static values
Django signals provide a useful starting point for automatic denormalisation.

Resources