How to Improve Django Tastypie web server performance - django-models

I have a django web server with Tastypie API. The performance is extremely slow, and I am not sure where to look.
The problem can be abstracted this way. It simply has 3 tables.
class Table1(models.Model):
name = models.CharField(max_length=64)
class Table2(models.Model):
name = models.CharField(max_length=64)
table1 = models.ForeignKey(Table1)
class Table3(models.Model):
name = models.CharField(max_length=64)
table2 = models.ForeignKey(Table2)
Table1 has about 50 record. Table2 has about 400 record. Table3 has about 2000 record. MySQL is used.
It has 3 model resource:
class Table1Resource(ModelResource):
class Meta(object):
"""Define options attached to model."""
queryset = models.Table1.objects.all()
resource_name = 'table1'
class Table2Resource(ModelResource):
class Meta(object):
"""Define options attached to model."""
queryset = models.Table2.objects.all()
resource_name = 'table2'
class Table3Resource(ModelResource):
class Meta(object):
"""Define options attached to model."""
queryset = models.Table3.objects.all()
resource_name = 'table3'
The front-end uses ajax to call 3 web service APIsto retrieve all data in database. My machine has very good configuration, such as 16 GB memory. But, it takes about 40 seconds to load all data. Too slow. It's obvious something is not right.
I tried some Django data model function to improve performance
1) Django queryset. I notice the API retrieve all table objects if there is foreign key. Table3Resource access is extremely slow. In my case, I just want data in 1 table, not interested in inner join result from another table. For example, it uses models.Table3.objects.all().
I tried models.LabSpace.objects.select_relate(). No help at all.
2) For such small amount of data with such low performance, I am not even thinking Tastypie API cache technique yet. I feel somewhere is obviously wrong.
Basically, I am not sure if it is Django or Tastypie issue. Where should I look?

You should specify the Resource ForeignKey field. Default is False I believe, so you just have to do this:
class Table2Resource(ModelResource):
table1 = fields.ToOneField(Table1Resource)
class Meta(object):
"""Define options attached to model."""
queryset = models.Table2.objects.all()
resource_name = 'table2'
# etc ...
If not you can try to explicitly set it like so:
table1 = fields.ToOneField(Table1Resource, 'table1', full=False)

Related

SQLAlchemy: foreignKeys from multiple Tables (Many-to-Many)

I'm using flask-sqlalchemy orm in my flask app which is about smarthome sensors and actors (for the sake of simplicity let's call them Nodes.
Now I want to store an Event which is bound to Nodes in order to check their state and other or same Nodes which should be set with a given value if the state of the first ones have reached a threshold.
Additionally the states could be checked or set from/for Groups or Scenes. So I have three diffrent foreignkeys to check and another three to set. All of them could be more than one per type and multiple types per Event.
Here is an example code with the db.Models and pseudocode what I expect to get stored in an Event:
db = SQLAlchemy()
class Node(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Group(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Scene(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
# columns snipped out
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
# The following columns may be in a intermediate table
# but I have no clue how to design that under these conditions
constraints = # list of foreignkeys from diffrent tables (Node/Group/Scene)
# with threshold per key
target = # list of foreignkeys from diffrent tables (Node/Group/Scene)
# with target values per key
In the end I want to be able to check if any of my Events are true to set the bound Node/Group/Scene accordingly.
It may be a database design problem (and not sqlalchemy) but I want to make use of the advantages of sqla orm here.
Inspired by this and that answer I tried to dig deeper, but other questions on SO were about more specific problems or one-to-many relationships.
Any hints or design tips are much appreciated. Thanks!
I ended up with a trade-off between usage and lines of code. My first thought here was to save as much code as I can (DRY) and defining as less tables as possible.
As SQLAlchemy itself points out in one of their examples the "generic foreign key" is just supported because it was often requested, not because it is a good solution. With that less db functionallaty is used and instead the application has to take care about key constraints.
On the other hand they said, having more tables in your database does not affected db performance.
So I tried some approaches and find a good one that fits to my usecase. Instead of a "normal" intermediate table for many-to-many relationships I use another SQLAlchemy class which has two one-to-many relations on both sides to connect two tables.
class Event(db.Model):
id = db.Column(db.Integer, primary_key=True)
noodles = db.relationship('NoodleEvent', back_populates='events')
# columns snipped out
def get_as_dict(self):
return {
"id": self.id,
"nodes": [n.get_as_dict() for n in self.nodes]
}
class Node(db.Model):
id = db.Column(db.Integer, primary_key=True)
value = db.Column(db.String(20))
events = db.relationship('NodeEvent', back_populates='node')
# columns snipped out
class NodeEvent(db.Model):
ev_id = db.Column('ev_id', db.Integer, db.ForeignKey('event.id'), primary_key=True)
n_id = db.Column('n_id', db.Integer, db.ForeignKey('node.id'), primary_key=True)
value = db.Column('value', db.String(200), nullable=False)
compare = db.Column('compare', db.String(20), nullable=True)
node = db.relationship('Node', back_populates="events")
events = db.relationship('Event', back_populates="nodes")
def get_as_dict(self):
return {
"trigger_value": self.value,
"actual_value": self.node.status,
"compare": self.compare
}
The trade-off is that I have to define a new class everytime I bind a new table on that relationship. But with the "generic foreign key" approach I also would have to check from where the ForeignKey is comming from. Same work in the end of the day.
With my get_as_dict() function I have a very handy access to the related data.

Tracking item order for storage to and retrieval from a DB

I'm trying to figure out how I'm going to 'CRUD' the order of items I have in a group that I'm storing in a database. (Pseudo statement of: select * items from app where group_id = 1;)
My guess is that I just use an numeric field and just increase/decrease the number as more items are added to/removed from the group. I can then just update the items number in this field as they are moved around. However, I've seen this go really badly wrong in an old legacy app where items would get out of sync and you'd have a group where the order ended up something like this:
0,1,1,3,4,5
0,1,1,1,4,5
This wasn't handled very gracefully by the application either, and broke the application necessitating manual intervention to reorder the items in the DB.
Is there a way to avoid this pitfall?
EDIT: I would also maybe want the items available in multiple groups with multiple orders.
I think in that case I would need a many to many relationship for both the group to item relationship and the item to order relationship. /EDIT
I'll be doing this in the Django framework.
I'm not really sure what you are asking; because ordering is one thing, and grouping of related objects is something else entirely.
Databases don't store the order of things, but rather the relationships (grouping) of things. The order of things is a user interface detail and not something that a database should be used for.
In django, you can create a ManyToMany relationship. This essentially creates a "box" where you can add and remove items that are related to a particular model. Here is the example from the documentation:
from django.db import models
class Publication(models.Model):
title = models.CharField(max_length=30)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.title
class Meta:
ordering = ('title',)
class Article(models.Model):
headline = models.CharField(max_length=100)
publications = models.ManyToManyField(Publication)
# On Python 3: def __str__(self):
def __unicode__(self):
return self.headline
class Meta:
ordering = ('headline',)
Here an Article can belong to many Publications, and Publications have one or more Articles associated with them:
a = Article.create(headline='Hello')
b = Article.create(headline='World')
p = Publication.create(title='My Publication')
p.article_set.add(a)
p.article_set.add(b)
p.save()
# You can also add an article to a publication from the article object:
c = Article.create(headline='The Answer is 42')
c.publications.add(p)
To know how many articles belong to a publication:
Publication.objects.get(title='My Publication').article_set.count()

NDB fetch model instance from Key using projection?

I have a relatively large model class that I can access by key id, e.g.:
class Foo(ndb.Model):
propA = ndb.IntegerProperty(required=True)
probB = ndb.StringProperty()
probC = ndb.JsonProperty()
key = ndb.Key('Foo', 1234)
model = key.get()
If I only need 'propA' from this model at this time, is there a way to create a projection=[Foo.propA] type request without creating an unnecessary query?
Queries are not necessarily more expensive than direct reads. How about:
model = Foo.Query('__key__'=key).fetch(1, projection=[Foo.propA])

Google App Engine ndb performance on repeated property

Do I pay a penalty on query performance if I choose to query repeated property? For example:
class User(ndb.Model):
user_name = ndb.StringProperty()
login_providers = ndb.KeyProperty(repeated=true)
fbkey = ndb.Key("ProviderId", 1, "ProviderName", "FB")
for entry in User.query(User.login_providers == fbkey):
# Do something with entry.key
vs
class User(ndb.Model)
user_name = ndb.StringProperty()
class UserProvider(ndb.Model):
user_key = ndb.KeyProperty(kind=User)
login_provider = ndb.KeyProperty()
for entry in UserProvider.query(
UserProvider.user_key == auserkey,
UserProvider.login_provider == fbkey
):
# Do something with entry.user_key
Based on the documentation from GAE, it seems that Datastore take care of indexing and the first less verbose option would be using the index. However, I failed to find any documentation to confirm this.
Edit
The sole purpose of UserProvider in the second example is to create a one-to-many relationship between a user and it's login_provider. I wanted to understand if it worth the trouble of creating a second entity instead of querying on repeated property. Also, assume that all I need is the key from the User.
No. But you'll raise your write costs because each entry needs to be indexed, and write costs are based on the number of indexes updated.

reduce google datastore read operation fee

I have a Kind XXX_account with 1000 entities. The Kind file size is 3 mb. Whenever I send a request, the Query need to be called to find a certain entity in the Kind. Therefore, I think the google fee is almost 4 usd in just a 20 hours.
Is there anyway to reduce the datastore read operations? I plan to store 1000 entities in txt file so that I do need to read datastore everytime.
Datastore Read Operations 5.01 Million Ops 4.96 $0.70/ Million Ops $3.48
My model.py
class MyUser(DatastoreUser):
pass
class XXXAccount(db.Model):
user = db.ReferenceProperty(MyUser,
collection_name='xxx_accounts')
id = db.StringProperty(required=True)
created = db.DateTimeProperty(auto_now_add=True)
updated = db.DateTimeProperty(auto_now=True)
name = db.StringProperty(required=True)
username = db.StringProperty(required=True)
profile_url = db.StringProperty(required=True)
aaa = db.StringProperty(required=True)
bbb = db.StringProperty(required=True)
view.py
#login_required
def updateprofile(request):
number_form = NumberForm()
if request.method =="POST" and number_form.validate(request.form):
acc_num_str = number_form['nb']
acc_num = int(acc_num_str)
current_user = request.user
xxx_account = current_user.xxx_accounts[acc_num] #Query
DO SOME THING WHICH DOES NOT RELATED TO READ AND WRITE DATASTORE OPERATION
return......
UPDATE:
Code was posted
OMG, 0.32 USD for just 1000 requests.
You should post your model definition and code where you do querying entities.
Common recommendations:
If you want to find certain entity(ies), there is only one right way to do it - use entity key (id number or key_name string) to get it. Datastore automatically assigns some id to entity when it saves it or you can manually set some nice key_name when you're creating entity.
To get entity's id or key_name use Model.key().id() or Model.key().name() in DB or Model.key.id() in NDB.
Then you can get entity by id or key_name with Model.get_by_id() or Model.get_by_key_name() methods if you're using old DB API or Key.get() method if you're using new NDB API. You can pass id or key_name to URL - http://example.com/getentity/[id].
Also, use Memcache to cache entities. Caching can extremely decrease using of Datastore. By the way, NDB automatically uses cache.
p.s. Sorry, I cannot post more than 2 links.

Resources