NDB fetch model instance from Key using projection? - google-app-engine

I have a relatively large model class that I can access by key id, e.g.:
class Foo(ndb.Model):
propA = ndb.IntegerProperty(required=True)
probB = ndb.StringProperty()
probC = ndb.JsonProperty()
key = ndb.Key('Foo', 1234)
model = key.get()
If I only need 'propA' from this model at this time, is there a way to create a projection=[Foo.propA] type request without creating an unnecessary query?

Queries are not necessarily more expensive than direct reads. How about:
model = Foo.Query('__key__'=key).fetch(1, projection=[Foo.propA])

Related

SQLAlchemy: foreignKeys from multiple Tables (Many-to-Many)

I'm using flask-sqlalchemy orm in my flask app which is about smarthome sensors and actors (for the sake of simplicity let's call them Nodes.
Now I want to store an Event which is bound to Nodes in order to check their state and other or same Nodes which should be set with a given value if the state of the first ones have reached a threshold.
Additionally the states could be checked or set from/for Groups or Scenes. So I have three diffrent foreignkeys to check and another three to set. All of them could be more than one per type and multiple types per Event.
Here is an example code with the db.Models and pseudocode what I expect to get stored in an Event:
db = SQLAlchemy()
class Node(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Scene(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
# The following columns may be in a intermediate table
# but I have no clue how to design that under these conditions
constraints = # list of foreignkeys from diffrent tables (Node/Group/Scene)
# with threshold per key
target = # list of foreignkeys from diffrent tables (Node/Group/Scene)
# with target values per key
In the end I want to be able to check if any of my Events are true to set the bound Node/Group/Scene accordingly.
It may be a database design problem (and not sqlalchemy) but I want to make use of the advantages of sqla orm here.
Inspired by this and that answer I tried to dig deeper, but other questions on SO were about more specific problems or one-to-many relationships.
Any hints or design tips are much appreciated. Thanks!
I ended up with a trade-off between usage and lines of code. My first thought here was to save as much code as I can (DRY) and defining as less tables as possible.
As SQLAlchemy itself points out in one of their examples the "generic foreign key" is just supported because it was often requested, not because it is a good solution. With that less db functionallaty is used and instead the application has to take care about key constraints.
On the other hand they said, having more tables in your database does not affected db performance.
So I tried some approaches and find a good one that fits to my usecase. Instead of a "normal" intermediate table for many-to-many relationships I use another SQLAlchemy class which has two one-to-many relations on both sides to connect two tables.
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
noodles = db.relationship('NoodleEvent', back_populates='events')
# columns snipped out
def get_as_dict(self):
return {
"id": self.id,
"nodes": [n.get_as_dict() for n in self.nodes]
}
class Node(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
events = db.relationship('NodeEvent', back_populates='node')
# columns snipped out
class NodeEvent(db.Model):
ev_id = db.Column('ev_id', db.Integer, db.ForeignKey('event.id'), primary_key=True)
n_id = db.Column('n_id', db.Integer, db.ForeignKey('node.id'), primary_key=True)
value = db.Column('value', db.String(200), nullable=False)
compare = db.Column('compare', db.String(20), nullable=True)
node = db.relationship('Node', back_populates="events")
events = db.relationship('Event', back_populates="nodes")
def get_as_dict(self):
return {
"trigger_value": self.value,
"actual_value": self.node.status,
"compare": self.compare
}
The trade-off is that I have to define a new class everytime I bind a new table on that relationship. But with the "generic foreign key" approach I also would have to check from where the ForeignKey is comming from. Same work in the end of the day.
With my get_as_dict() function I have a very handy access to the related data.

How to update one property for multiple NDB entities?

I have NDB model class like below:
class Contest(ndb.Model):
end_date = ndb.DateProperty(required=True, indexed=True)
ended = ndb.BooleanProperty(required=False, indexed=True)
...
I will have a daily cron job to mark contests with passed end_date with ended equal to True. I've written the following code to do it:
contests = Contest.query()
current_datetime = datetime.datetime.utcnow()
today = datetime.date(current_datetime.year, current_datetime.month, current_datetime.day)
contests = contests.filter(Contest.end_date < today)
contests = contests.filter(Contest.ended == False)
contests = contests.fetch(limit=10)
for contest in contests:
contest.ended = True
ndb.put_multi(contests)
But I don't like it, since I have to read all entities just to update one value. Is there any way to modify it to read keys_only?
The object data overwrites the existing entity. The entire object is sent to Datastore
https://cloud.google.com/datastore/docs/concepts/entities#Datastore_Updating_an_entity
So you cannot send only one field of an entity, it will "remove" all existing fields. To be more accurate - replace an entity with all fields with new version of entity that have only one field.
You have to load all entities you want to update, with all properties, not just keys, set new value of a property, and put back into database.
I think a Python property is a good solution here:
class Contest(ndb.Model):
end_date = ndb.DateProperty(required=True, indexed=True)
#property
def ended(self):
return self.end_date < date.today()
This way you don't ever need to update your entities. The value is automatically computed whenever you need it.

How to Improve Django Tastypie web server performance

I have a django web server with Tastypie API. The performance is extremely slow, and I am not sure where to look.
The problem can be abstracted this way. It simply has 3 tables.
class Table1(models.Model):
name = models.CharField(max_length=64)
class Table2(models.Model):
name = models.CharField(max_length=64)
table1 = models.ForeignKey(Table1)
class Table3(models.Model):
name = models.CharField(max_length=64)
table2 = models.ForeignKey(Table2)
Table1 has about 50 record. Table2 has about 400 record. Table3 has about 2000 record. MySQL is used.
It has 3 model resource:
class Table1Resource(ModelResource):
class Meta(object):
"""Define options attached to model."""
queryset = models.Table1.objects.all()
resource_name = 'table1'
class Table2Resource(ModelResource):
class Meta(object):
"""Define options attached to model."""
queryset = models.Table2.objects.all()
resource_name = 'table2'
class Table3Resource(ModelResource):
class Meta(object):
"""Define options attached to model."""
queryset = models.Table3.objects.all()
resource_name = 'table3'
The front-end uses ajax to call 3 web service APIsto retrieve all data in database. My machine has very good configuration, such as 16 GB memory. But, it takes about 40 seconds to load all data. Too slow. It's obvious something is not right.
I tried some Django data model function to improve performance
1) Django queryset. I notice the API retrieve all table objects if there is foreign key. Table3Resource access is extremely slow. In my case, I just want data in 1 table, not interested in inner join result from another table. For example, it uses models.Table3.objects.all().
I tried models.LabSpace.objects.select_relate(). No help at all.
2) For such small amount of data with such low performance, I am not even thinking Tastypie API cache technique yet. I feel somewhere is obviously wrong.
Basically, I am not sure if it is Django or Tastypie issue. Where should I look?
You should specify the Resource ForeignKey field. Default is False I believe, so you just have to do this:
class Table2Resource(ModelResource):
table1 = fields.ToOneField(Table1Resource)
class Meta(object):
"""Define options attached to model."""
queryset = models.Table2.objects.all()
resource_name = 'table2'
# etc ...
If not you can try to explicitly set it like so:
table1 = fields.ToOneField(Table1Resource, 'table1', full=False)

How to flatten a 'friendship' model within User model in GAE?

I recently came across a number of articles pointing out to flatten the data for NoSQL databases. Coming from traditional SQL databases I realized I am replicating a SQL db bahaviour in GAE. So I started to refactor code where possible.
We have e.g. a social media site where users can become friends with each other.
class Friendship(ndb.Model):
from_friend = ndb.KeyProperty(kind=User)
to_friend = ndb.KeyProperty(kind=User)
Effectively the app creates a friendship instance between both users.
friendshipA = Friendship(from_friend = UserA, to_friend = userB)
friendshipB = Friendship(from_friend = UserB, to_friend = userA)
How could I now move this to the actual user model to flatten it. I thought maybe I could use a StructuredProperty. I know it is limited to 5000 entries, but that should be enough for friends.
class User(UserMixin, ndb.Model):
name = ndb.StringProperty()
friends = ndb.StructuredProperty(User, repeated=True)
So I came up with this, however User can't point to itself, so it seems. Because I get a NameError: name 'User' is not defined
Any idea how I could flatten it so that a single User instance would contain all its friends, with all their properties?
You can't create a StructuredProperty that references itself. Also, use of StructuredProperty to store a copy of User has additional problem of needing to perform a manual cascade update if a user ever modifies a property that is stored.
However, as KeyProperty accept String as kind, you can easily store the list of Users using KeyProperty as suggested by #dragonx. You can further optimise read by using ndb.get_multi to avoid multiple round-trip RPC calls when retrieving friends.
Here is a sample code:
class User(ndb.Model):
name = ndb.StringProperty()
friends = ndb.KeyProperty(kind="User", repeated=True)
userB = User(name="User B")
userB_key = userB.put()
userC = User(name="User C")
userC_key = userC.put()
userA = User(name="User A", friends=[userB_key, userC_key])
userA_key = userA.put()
# To retrieve all friends
for user in ndb.get_multi(userA.friends):
print "user: %s" % user.name
Use a KeyProperty that stores the key for the User instance.

Should the size of entities be as small as possible when I count them by "count()" method?

I'm wondering if I should have a kind only for counting entities.
For example
There is a model like the following.
class Message(db.Model):
title = db.StringProperty()
message = db.StringProperty()
created_on = db.DateTimeProperty()
created_by = db.ReferenceProperty(User)
category = db.StringProperty()
And there are 100000000 entities made of this model.
I want to count entities which category equals 'book'.
In this case, should I create the following mode for counting them?
class Category(db.Model):
category = db.StringProperty()
look_message = db.ReferenceProperty(Message)
Does this small model make it faster to count?
And does it erase smaller memory?
I'm thinking to count them like the following by the way
q = db.Query(Message).filter('category =', 'book')
count = q.count(10000)
Counting 100000000 entities is a very expensive operation on a NoSQL database as the App Engine datastore. You'll probably want to count as you update, or run a map-reduce operation to count after the fact.
App Engine also offers a simple way to query how many entities of each type you have:
https://developers.google.com/appengine/docs/python/datastore/stats
For example, to count all Messages:
from google.appengine.ext.db import stats
kind_stats = stats.KindStat().all().filter("kind_name =", "Message").get()
count = kind_stats.count
Note that stats are updated asynchronously, so they'll lag the actual count.
I think that you have to create another entity like this.
This entity will just count the number of messages by category.
Just change your category to this:
class Category(db.model):
category = db.StringProperty()
totalOfMessages = db.IntegerProperty(default=0)
In the message class you change to reference the category class, just change the category property to:
category = db.ReferenceProperty(Category)
When you create a new Message object, you have to update the counter, increment when you create a new message or decrement if you delete.
The best way to work with counters on GAE is using Sharding Counters
Count is implemented as an index scan that discards all data except the number of records seen . It never looks up the entity, so the size of the entity does not matter.
That being said, counting like this does not scale and is quite costly in a system without a fixed schema. It would likely be better to use another method like a Sharded Counter, MapReduce or Materialized View/Fork Join. If you really want it to scale, this talk is pretty informative: http://www.google.com/events/io/2010/sessions/high-throughput-data-pipelines-appengine.html

Resources