When Expando Class should be used in Google App Engine Apps? - google-app-engine

What are the applications for Google App Engine Expando Class?
And what are the good practices related to it?

Two common uses of Expandos are partially-fixed schemas and deleting old properties.
I frequently use Expando when I have a kind that needs slightly different properties across entities; in other words, when I need a 'partially' dynamic schema. One use-cases is an application that takes orders where some products are liquid (think water), some are physical units (think DVDs), and some are 'other' (think flour). Some fields, like item code, price and quantity, are always needed. But, what if the details of how quantity was computed is also needed?
Typically a fixed-schema solution would be to add a property for all of the variables we might use: weight, dimension, before and after weights of our stock, and so on. That sucks. For every entity most of the other fields are not needed.
class Order(db.Model):
# These fields are always needed.
item_code = db.StringProperty()
unit_of_measure = db.StringProperty()
unit_price = db.FloatProperty()
quantity = db.FloatProperty()
# These fields are used depending on the unit of measure.
weight = db.FloatProperty()
volume = db.FloatProperty()
stock_start_weight = db.FloatProperty()
stock_end_weight = db.FloatProperty()
With Expando we can do much better. We could use the unit_of_measure to tell us how we computed quantity. The functions that compute quantity can set the dynamic fields, and the functions that read that method's information know what to look for. And, the entity does not have a bunch of unneeded properties.
class Order(db.Expando):
# Every instance has these fields.
item_code = db.StringProperty()
unit_of_measure = db.StringProperty()
unit_price = db.FloatProperty()
quantity = db.FloatProperty()
def compute_gallons(entity, kilograms, kg_per_gallon):
# Set the fixed fields.
entity.unit_of_measure = 'GAL'
entity.quantity = kilograms / kg_per_gallon
# Set the gallon specific fields:
entity.weight = kilograms
entity.density = kg_per_gallon
You could achieve a similar result by using a text or blob property and serializing a dict of 'other' value to it. Expando basically 'automates' that for you.

Related

Tracking item order for storage to and retrieval from a DB

I'm trying to figure out how I'm going to 'CRUD' the order of items I have in a group that I'm storing in a database. (Pseudo statement of: select * items from app where group_id = 1;)
My guess is that I just use an numeric field and just increase/decrease the number as more items are added to/removed from the group. I can then just update the items number in this field as they are moved around. However, I've seen this go really badly wrong in an old legacy app where items would get out of sync and you'd have a group where the order ended up something like this:
0,1,1,3,4,5
0,1,1,1,4,5
This wasn't handled very gracefully by the application either, and broke the application necessitating manual intervention to reorder the items in the DB.
Is there a way to avoid this pitfall?
EDIT: I would also maybe want the items available in multiple groups with multiple orders.
I think in that case I would need a many to many relationship for both the group to item relationship and the item to order relationship. /EDIT
I'll be doing this in the Django framework.
I'm not really sure what you are asking; because ordering is one thing, and grouping of related objects is something else entirely.
Databases don't store the order of things, but rather the relationships (grouping) of things. The order of things is a user interface detail and not something that a database should be used for.
In django, you can create a ManyToMany relationship. This essentially creates a "box" where you can add and remove items that are related to a particular model. Here is the example from the documentation:
from django.db import models
class Publication(models.Model):
title = models.CharField(max_length=30)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.headline
class Meta:
ordering = ('headline',)
Here an Article can belong to many Publications, and Publications have one or more Articles associated with them:
a = Article.create(headline='Hello')
b = Article.create(headline='World')
p = Publication.create(title='My Publication')
p.article_set.add(a)
p.article_set.add(b)
p.save()
# You can also add an article to a publication from the article object:
c = Article.create(headline='The Answer is 42')
c.publications.add(p)
To know how many articles belong to a publication:
Publication.objects.get(title='My Publication').article_set.count()

Datastore design: 1 large class vs. 2 classes vs. polymodel?

I am interested in understanding the pros / cons of several ways to design classes for Google App Engine's Datastore.
Consider the following classes:
Option 0
class Car(db.Model):
title = db.StringProperty()
year = db.StringProperty()
imgurl = db.StringProperty()
type = db.StringProperty()
addeddate = db.DateTimeProperty()
external_id = db.IntegerProperty()
# possibly 5 or 6 more properties
class Part(db.Model):
title = db.StringProperty()
# other stuff
Part's parent is always set to the corresponding Car on creation.
These are used in several ways:
query + list (+ sort) parts: when listing the part, I need to display the Car's title, and get its external_id and year (so I don't need everything but the whole Car entity is fetched by accessing the part.parent, I am already using parent prefetch).
query + list (+ sort) cars: only need the title, year and imgurl.
get car: page with all the car details, need all the properties.
Considering the ways I get and display my data, what is the best option (providing pros/cons) between the above design and the followings?
Option 1
class Car(db.Model):
title = db.StringProperty()
year = db.StringProperty()
imgurl = db.StringProperty()
class CarEx(db.Model):
type = db.StringProperty()
addeddate = db.DateTimeProperty()
external_id = db.IntegerProperty()
# possibly 5 or 6 more properties
Pro: When fetching Parts, getting the parents (Car) is faster since there are less properties.
Con: When displaying a Car, we need to get the CarEx. Need to add one more entity when adding a Car. Need to delete CarEx when deleting a Car.
Option 2
class Car(db.PolyModel):
title = db.StringProperty()
year = db.StringProperty()
imgurl = db.StringProperty()
class CarEx(Car):
type = db.StringProperty()
addeddate = db.DateTimeProperty()
external_id = db.IntegerProperty()
# possibly 5 or 6 more properties
When adding cars, we would only add CarEx entities.
Pro: When fetching Parts, getting the parents (Car) is faster since there are less properties. ??? I am actually not sure at all this is true. ???
Pro: When displaying a Car, we get the CarEx. No need to get another entity. Adding and deleting cars is as easy as having only 1 Car model with everything in it (Option 0).
Con: Extra writes when adding a CarEx. Other extra costs?
So overall, I need to be able to fetch parts (and their parents, without a huge cost), and I need to fetch a full Car on a separate page. I am not sure if my assumptions about PolyModel are correct, nor if there are any other hidden pros/cons, or even other options.
A few points, If you are starting out, really you should be using ndb.
The small number of properties you list are not going to make enough difference to use Car and CarEx. Especially if you need CarEx all the time.
You use of PolyModel doesn't make sense, given how PolyModel works. Polymodel would be more suited to
class Vehicle(PolyModel):
title = StringProperty
year = StringProperty()
addeddate = db.DateTimeProperty()
external_id = db.IntegerProperty()
# possibly 5 or 6 more properties
class Car(Vehicle):
doors = IntegerProperty
class Van(Vehicle):
carrying_capacity = FloatProperty() #(m3)
class Truck(Vehicle):
tray_length = IntegerProperty()
Yep contrived, properties. But now I can search for all vehicles by any of the core Vehicle properties and get Trucks and Vans and Cars. You can't do this with normal model inheritance. Without PolyModel you would have to search Car, Truck entity types seperately.
In your case you probably don't need this.
What you do with Parts depends heavily on how many, and how often you need them. If you are likely to have less than 1MB of Parts and you need all Parts when you need Parts, then consider storeing Parts in a single container entity, and use a repeated StructuredProperty to store them. Then when you need parts you fetch them in a single entity. If you only need some parts then store them as separate entities.
If you need more than 1MB of Parts but you always need all parts then use more than one container.
You really need to look at the frequency of use of particular views, if you need all information vs some of it, to determine the best strategy.

Should the size of entities be as small as possible when I count them by "count()" method?

I'm wondering if I should have a kind only for counting entities.
For example
There is a model like the following.
class Message(db.Model):
title = db.StringProperty()
message = db.StringProperty()
created_on = db.DateTimeProperty()
created_by = db.ReferenceProperty(User)
category = db.StringProperty()
And there are 100000000 entities made of this model.
I want to count entities which category equals 'book'.
In this case, should I create the following mode for counting them?
class Category(db.Model):
category = db.StringProperty()
look_message = db.ReferenceProperty(Message)
Does this small model make it faster to count?
And does it erase smaller memory?
I'm thinking to count them like the following by the way
q = db.Query(Message).filter('category =', 'book')
count = q.count(10000)
Counting 100000000 entities is a very expensive operation on a NoSQL database as the App Engine datastore. You'll probably want to count as you update, or run a map-reduce operation to count after the fact.
App Engine also offers a simple way to query how many entities of each type you have:
https://developers.google.com/appengine/docs/python/datastore/stats
For example, to count all Messages:
from google.appengine.ext.db import stats
kind_stats = stats.KindStat().all().filter("kind_name =", "Message").get()
count = kind_stats.count
Note that stats are updated asynchronously, so they'll lag the actual count.
I think that you have to create another entity like this.
This entity will just count the number of messages by category.
Just change your category to this:
class Category(db.model):
category = db.StringProperty()
totalOfMessages = db.IntegerProperty(default=0)
In the message class you change to reference the category class, just change the category property to:
category = db.ReferenceProperty(Category)
When you create a new Message object, you have to update the counter, increment when you create a new message or decrement if you delete.
The best way to work with counters on GAE is using Sharding Counters
Count is implemented as an index scan that discards all data except the number of records seen . It never looks up the entity, so the size of the entity does not matter.
That being said, counting like this does not scale and is quite costly in a system without a fixed schema. It would likely be better to use another method like a Sharded Counter, MapReduce or Materialized View/Fork Join. If you really want it to scale, this talk is pretty informative: http://www.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html

How to avoid duplicates in GAE datastore?

Let's say here is the database structure:
class News(db.Model):
title = db.StringProperty()
class NewsRating(db.Model):
user = db.IntegerProperty()
rating = db.IntegerProperty()
news = db.ReferenceProperty(News)
Each user can leave only one rating for each News. The following code doesn't care about duplicates:
rating = NewsRating()
rating.user = 123456
rating.rating = 1
rating.news = News.get_by_key_name('news-unique-key')
rating.put()
How should I modify that that it will allow to have only one record for each rating.user and rating.news combination? If such rating already exists, then it should be updated with new value.
Use key names and (possibly) parent entities to keep track. For instance, supposing you have a UserInfo kind, you could do it like this:
class NewsRating(db.Model):
# No explicit user reference, since it's the parent entity
rating = db.IntegerProperty(required=True)
news = db.ReferenceProperty(News) # We could get this from the key name, but this is more convenient
rating = NewsRating(parent=current_user, key_name=str(news.key().id()), news=news)
rating.put()
Attempting to add the same rating multiple times will simply overwrite the existing one, or you can use a datastore transaction to add it atomically.
Note that you should almost certainly keep a total of ratings against the News entity, rather than counting up ratings on each request, which will get less efficient as the number of ratings increases.

How to order by the field stored in the separate model?

Here is simplified version of my datastore structure:
class News(db.Model):
title = db.StringProperty()
class NewsRating(db.Model):
user = db.IntegerProperty()
rating = db.IntegerProperty()
news = db.ReferenceProperty(News)
Now I need to display all news sorted by their total rating (sum of different users ratings). How can I do that in the following code:
news = News.all()
# filter by additional parms
# news.filter("city =", "1")
news.order("-added") # ?
for one_news in news:
self.response.out.write(one_news.title()+'<br>')
Queries only have access to the entity you're querying against, if you have a property from another entity (or some aggregate calculation based on fields from other entities) that you want to use to order results, you're going to need to store it in the entity you're querying against.
In the case of ratings, that might mean a periodic task that sums up ratings and distributes them to articles.
To do that you would need to run a query fetching every single NewsRating referencing your News entity and sum all the ratings (as the datastore does not provide JOINs). This will be a huge task both time and cost wise. I'd recommend to take a look at just-overheard-it example as a reference point.

Resources