How to migrate from db.model to ndb.model? [duplicate] - google-app-engine

This question already has an answer here:
Appengine - Upgrading from standard DB to NDB - ReferenceProperties
(1 answer)
Closed 6 years ago.
I have an old gae app that uses db.model. I understand that I should migrate to ndb.model. Is there an easy way or must I make many changes?
My code is fairly large and I have an old model that is db.model that I use to build index and search. Should I keep the old model and make a new ndb model or try and change the old model?
Some of the variables in the model are
cities = db.ListProperty(db.Key) #ndb.KeyProperty(repeated=True)
regions = db.ListProperty(db.Key) #ndb.KeyProperty(repeated=True)
blobs = db.ListProperty(db.BlobKey) #ndb.BlobProperty(repeated=True)
primary_image = blobstore.BlobReferenceProperty()
usr = db.ReferenceProperty() # ndb_model.KeyProperty()
hasimages = db.BooleanProperty(default=False,
verbose_name='has_images')
userID = db.StringProperty(verbose_name='User ID')
integer_price = db.IntegerProperty()
ip = ndb.StringProperty(verbose_name='ip')
ipcountry = db.StringProperty(indexed=False, verbose_name='origin')
tags = db.ListProperty(db.Category)
category = db.CategoryProperty(verbose_name='Category')
title = db.StringProperty(verbose_name='title') # required
user = db.UserProperty(verbose_name='userid')
im = db.IMProperty(verbose_name='nickname') # optional, xmpp
city = db.StringProperty() # postaladdress should work instead
region = db.StringProperty() # postaladdress should work instead
url = db.StringProperty(verbose_name='url')
geopt = db.GeoPtProperty(verbose_name='geopt')
text = db.TextProperty(verbose_name='text')

It shouldn't be particularly complex, and can be done incrementally, as you can mix db and ndb code in the same binary. See DB to NDB Client Library Migration.

Related

How to remove value from GAE NDB Property (type BlobKeyProperty)

It might be the most dumb question and my apologies for the same but I am confused
I have the following entity:
class Profile(ndb.Model):
name = ndb.StringProperty()
identifier = ndb.StringProperty()
pic = ndb.BlobKeyProperty() # stores the key to the profile picture blob
I want to delete the "pic" property value of the above entity so that it should look as fresh as if "pic" was never assigned any value. I do not intend to delete the complete entity. Is the below approach correct:
qry = Profile.query(Profile.identifier==identifier)
result_record_list = qry.fetch()
if result_record_list:
result_record_list[0].pic.delete() # or result_record_list[0].pic = none # or undefined or null
I am deleting the actual blob referred by this blob key separately
assign None to it and put it back to the datastore.
result_record_list[0].pic = None
result_record_list[0].put()
The datastore is an OO schemaless databse. So you can add and remove properties from the the Kind (ndb.Model) without the need of a schema update.
If you also want to cleanup the entities look at this anwser from Guido

How to optimize one-to many queries in the datastore

I have a latency problem in my application due to the datastore doing additional queries for referenced entities. I have received good advice on how to handle this for single value properties by the use of the get_value_for_datastore() function. However my application also have one-to many relationships as shown in the code below, and I have not found a way to prefetch these entities. The result is an unacceptable latency when trying to show a table of 200 documents and their associated documentFiles (>6000ms).
(There will probably never be more than 10.000 Documents or DocumentFiles)
Is there a way to solve this?
models.py
class Document(db.Expando):
title = db.StringProperty()
lastEditedBy = db.ReferenceProperty(DocUser, collection_name = 'documentLastEditedBy')
...
class DocUser(db.Model):
user = db.UserProperty()
name = db.StringProperty()
hasWriteAccess= db.BooleanProperty(default = False)
isAdmin = db.BooleanProperty(default = False)
accessGroups = db.ListProperty(db.Key)
...
class DocumentFile(db.Model):
description= db.StringProperty()
blob = blobstore.BlobReferenceProperty()
created = db.DateTimeProperty() # needs to be stored here in relation to upload / download of everything
document = db.ReferenceProperty(Document, collection_name = 'files')
#property
def link(self):
return '%s' % (self.key().id(),self.blob.filename)
...
main.py
docUsers = DocUser.all()
docUsersNameDict = dict([(i.key(), i.name) for i in docUsers])
documents = Document.all()
for d idocuments:
out += '<td>%s</td>' % d.title
docUserKey = Document.lastEditedBy.get_value_for_datastore(d)
out +='<td>%s</td>' % docUsersNameDict.get(docUserKey)
out += '<td>'
# Creates a new query for each document, resulting in unacceptable latency
for file in d.files:
out += file.link + '<br>'
out += '</td>'
Denormalize and store the link in your Document, so that getting the link will be fast.
You will need to be careful that when you update a DocumentFile, you need to update the associated Document. This operates under the assumption that you read the link from the datastore far more often than you update it.
Denormalizing is often the fix for poor performance on App Engine.
Load your files asynchronously. Use get_value_for_datastore on d.files, which should return a collection of keys, which you can then do db.get_async(key) to return a future object. You will not be able to write out your result procedurally as you have done, but it should be trivial to assemble a partial request / dictionary for all documents, with a collection of pending future gets(), and then when you do your iteration to build the results, you can finalize the futures, which will have finished without blocking {~0ms latency}.
Basically, you need two iterations. The first iteration will go through and asynchronously request the files you need, and the second iteration will go through, finalize your gets, and build your response.
https://developers.google.com/appengine/docs/python/datastore/async

How can I better use filters in appengine to save me filtering by looping through a long list of entities?

the following bit of code is run regularly as a cronjob and is turning out to be very computationally expensive! The main problem is in the for loop, and I think this can be made a little more efficient using better filtering, however I'm at a loss as to how I can do that.
free_membership_type = MembershipType.all().filter("membership_class =", "Free").filter("live =", True).get()
all_free_users = UserMembershipType.all().filter("membership_active =", True)
all_free_users = all_free_users.filter("membership_type =", free_membership_type).fetch(limit = 999999)
if all_free_users:
for free_user in all_free_users:
activation_status = ActivationStatus.all().filter("user = ", free_user.user).get()
if activation_status and activation_status.activated:
documents_left = WeeklyLimits.all().filter("user = ", free_user.user).get()
if documents_left > 0:
do something...
The models which the code uses are:
class MembershipType(db.Model):
membership_class = db.StringProperty()
membership_code = db.StringProperty()
live = db.BooleanProperty(default = False)
class UserMembershipType(db.Model):
user = db.ReferenceProperty(UserModel)
membership_type = db.ReferenceProperty(MembershipType)
membership_active = db.BooleanProperty(default = False)
class ActivationStatus(db.Model):
user = db.ReferenceProperty(UserModel)
activated = db.BooleanProperty(default = False)
class WeeklyLimits(db.Model):
user = db.ReferenceProperty(UserModel)
membership_type = db.ReferenceProperty(MembershipType)
documents_left = db.IntegerProperty(default = 0)
The code I'm using in production does make better use of caching for the various entities, however the for loop still has to cycle through a bunch of users to finally find the few that it needs to do the operation on. Ideally I'd filter out all of the users that don't fulfil the criteria and only then start looping through the list of users - is there some kind of magic bullet that I can use here to achieve this?
The magic that you are probably looking for is denormalization. It looks to me like these classes can all be meaningfully combined into a single model:
class Membership(db.Model):
user = db.ReferenceProperty(UserModel)
membership_class = db.StringProperty()
membership_code = db.StringProperty()
live = db.BooleanProperty(default = False)
membership_active = db.BooleanProperty(default = False)
activated = db.BooleanProperty(default = False)
documents_left = db.IntegerProperty(default = 0)
Then, you can use one query to do all of your filtering.
Over-normalization is a common anti-pattern in AppEngine development. The models that you posted look like they might as well be table definitions for a relational database (although, it's arguable whether its more compartmentalized than needed even for that scenario) and AppEngine's datastore is very much not a relational database.
Can you see any downside to storing all of those fields in a single model?
You could improve this by storing the data closer together in a single model. For example, a single entity kind of UserMembership could have all of the fields you need, and you could do a single query:
.filter("membership_type =", "FREE").filter("status =", "ACTIVE").filter("documentsLeft >", 0)
This would require an extra index to be defined, but will run much, much faster.
If you want to avoid denormalizing your data as suggested in the other two answers, you could also consider using Google's new SQL service instead of the normal datastore: http://googleappengine.blogspot.com/2011/10/google-cloud-sql-your-database-in-cloud.html
With SQL you could do all of this in a single query, even with separate entities.

Google App Engine - Is this also a Put method? Or something else

Was wondering if I'm unconsciously using the Put method in my last line of code ( Please have a look). Thanks.
class User(db.Model):
name = db.StringProperty()
total_points = db.IntegerProperty()
points_activity_1 = db.IntegerProperty(default=100)
points_activity_2 = db.IntegerProperty(default=200)
def calculate_total_points(self):
self.total_points = self.points_activity_1 + self.points_activity_2
#initialize a user ( this is obviously a Put method )
User(key_name="key1",name="person1").put()
#get user by keyname
user = User.get_by_key_name("key1")
# QUESTION: is this also a Put method? It worked and updated my user entity's total points.
User.calculate_total_points(user)
While that method will certainly update the copy of the object that is in-memory, I do not see any reason to believe that the change will be persisted to the the datastore. Datastore write operations are costly, so they are not going to happen implicitly.
After running this code, use the datastore viewer to look at the copy of the object in the datastore. I think that you may find that it does not have the changed total_point value.

get_by_id method on Model classes in Google App Engine Datastore

I'm unable to workout how you can get objects from the Google App Engine Datastore using get_by_id. Here is the model
from google.appengine.ext import db
class Address(db.Model):
description = db.StringProperty(multiline=True)
latitude = db.FloatProperty()
longitdue = db.FloatProperty()
date = db.DateTimeProperty(auto_now_add=True)
I can create them, put them, and retrieve them with gql.
address = Address()
address.description = self.request.get('name')
address.latitude = float(self.request.get('latitude'))
address.longitude = float(self.request.get('longitude'))
address.put()
A saved address has values for
>> address.key()
aglndWVzdGJvb2tyDQsSB0FkZHJlc3MYDQw
>> address.key().id()
14
I can find them using the key
from google.appengine.ext import db
address = db.get('aglndWVzdGJvb2tyDQsSB0FkZHJlc3MYDQw')
But can't find them by id
>> from google.appengine.ext import db
>> address = db.Model.get_by_id(14)
The address is None, when I try
>> Address.get_by_id(14)
AttributeError: type object 'Address' has no attribute 'get_by_id'
How can I find by id?
EDIT: It turns out I'm an idiot and was trying find an Address Model in a function called Address. Thanks for your answers, I've marked Brandon as the correct answer as he got in first and demonstrated it should all work.
I just tried it on shell.appspot.com and it seems to work fine:
Google Apphosting/1.0
Python 2.5.2 (r252:60911, Feb 25 2009, 11:04:42)
[GCC 4.1.0]
>>> class Address(db.Model):
description = db.StringProperty(multiline=True)
latitude = db.FloatProperty()
longitdue = db.FloatProperty()
date = db.DateTimeProperty(auto_now_add=True)
>>> addy = Address()
>>> addyput = addy.put()
>>> addyput.id()
136522L
>>> Address.get_by_id(136522)
<__main__.Address object at 0xa6b33ae3bf436250>
An app's key is a list of (kind, id_or_name) tuples - for root entities, always only one element long. Thus, an ID alone doesn't identify an entity - the type of entity is also required. When you call db.Model.get_by_id(x), you're asking for the entity with key (Model, x). What you want is to call Address.get_by_id(x), which fetches the entity with key (Address, x).
You should use long type in get_by_id("here").
Int type must have a error message.

Resources