How to seed google's NDB (app engine storage)

How to seed google's NDB (app engine storage) - google-app-engine

I am a NDB user and this object database is quite cool. But how can I seed specific default values directly after deployment? Is there some predefined functionality or standardized way for database seeding?
As example:
I have the following ndb.Model and want some sort of "existing default parent".
Category(ndb.Model):
name = ndb.StringProperty(required=True)
parent = ndb.KeyProperty(kind='Category',required=True,
default=<KeyOfRootCategory>)
Where to put the following seeding values?
main_category = Category(name="all", parent=None) #this is the root category
main_category.put()

Doesn't look like there are dedicated 'post-deployment' hooks for that. I'd simply put some code into the main handler script (the one that contains 'webapp2.WSGIApplication(...)') checking if the root category exists already and create it if not. Alternatively this could be part of some handler action.

Why not create a simple seeding handler to call after deployment (e.g. /seeding/example)? The way I see it you only have to seed once so there's no need for some sort of hook.
seed.py:
class ExampleHandler(webapp2.RequestHandler):
def get(self):
# Do your thing
# Maybe use "get_or_insert()". See [1]
return
app = webapp2.WSGIApplication(
[
('/example', ExampleHandler),
],
debug=True
)
Then in your app.yaml:
- url: /seeding/.*
script: seed.app
login: admin
The last line is crucial. It protects your seeding script from unauthorized access (see [2]).
[1] https://developers.google.com/appengine/docs/python/ndb/modelclass#Model_get_or_insert
[2] https://developers.google.com/appengine/docs/python/config/appconfig#Python_app_yaml_Requiring_login_or_administrator_status

I think I understand what you are asking.
You can create a parent key without having to create the entity. That will define your entity group.
Alternately it doesn't need a parent, but will be the parent of any child. Any entity with out a parent defined in the key becomes the root of it's own entity group, and that entity group can have 1 or more members (i.e itself and any children.)

Related

Create a model in app1 when app2 loads

Imagine I have an app1 called 'pricelists' and app2 called 'marketplaces'.
In the marketplaces app, I want to auto create a pricelists.PriceList if it is not yet present. This PriceList is to be used in signals to auto-populate the pricelist depending on a few factors.
Currently, I use something like this in my signals:
price_list, _ = PriceList.objects.get_or_create(
currency='EUR', is_default=False, customer_type='CONS',
remarks='Marketplace')
I don't like this approach since it's repeated a number of times and plainly want the pricelist to be created for sure.
My question. How can I get_or_create a model-object in another app every time django restarts?
Solution
In your app.__init__.py manually define your AppConfig. It doesn't seem to get detected in django 1.10
default_app_config = 'marketplaces.apps.MarketPlacesConfig'
Override your appconfig ready method:
class MarketPlacesConfig(AppConfig):
name = 'marketplaces'
def ready(self):
from pricelists.models import PriceList, PriceListItem
price_list_marketplaces, _ = PriceList.objects.get_or_create(
**settings.MARKETPLACES['price_list']

AppConfig.ready() with django.db.models.signals is the only way I can think of.

pattern for updating a datastore object

I'm wondering what the right pattern should be to update an existing datastore object using endpoints-proto-datastore.
For example, given a model like the one from your GDL videos:
class Task(EndpointsModel):
detail = ndb.StringProperty(required=True)
owner = ndb.StringProperty()
imagine we'd like to update the 'detail' of a Task.
I considered something like:
#Task.method(name='task.update',
path='task/{id}',
request_fields=('id', 'detail'))
def updateTask(self, task):
pass
However, 'task' would presumably contain the previously-stored version of the object, and I'm not clear on how to access the 'new' detail variable with which to update the object and re-store it.
Put another way, I'd like to write something like this:
def updateTask(self, task_in_datastore, task_from_request):
task_in_datastore.detail = task_from_request.detail
task_in_datastore.put()
Is there a pattern for in-place updates of objects with endpoints-proto-datastore?
Thanks!

See the documentation for details on this
The property id is one of five helper properties provided by default
to help you perform common operations like this (retrieving by ID). In
addition there is an entityKey property which provides a base64
encoded version of a datastore key and can be used in a similar
fashion as id...
This means that if you use the default id property your current object will be retrieved and then any updates from the request will replace those on the current object. Hence doing the most trivial:
#Task.method(name='task.update',
path='task/{id}',
request_fields=('id', 'detail'))
def updateTask(self, task):
task.put()
return task
will perform exactly what you intended.

Task is your model, you can easily update like this:
#Task.method(name='task.update',
path='task/{id}',
request_fields=('id', 'detail'))
def updateTask(self, task):
# Task.get_by_id(task.id)
Task.detail = task.detail
Task.put()
return task

Why does db.Model.get_by_id() return None when no parent is specified?

I'm running the following code in the GAE interactive console (/_ah/admin/interactive), and I do not understand why get_by_id() returns None when the parent is not specified. The docs do not make this limitation clear and I can't think of a reason to enforce it.
import my_model
print my_model.all().fetch(1)[0].key().id() # Returns 33006, used later
print my_model.get_by_id(33006)
print my_model.get_by_id(my_model.all().fetch(1)[0].key().id())
parent = my_model.all().fetch(1)[0].parent()
print my_model.get_by_id(33006, parent=parent)
Output:
33006
None
None
<my_model object at 0x109a6a690>
db.Model definition and code showing object creation with ancestor:
class my_model(db.Model):
user_id = db.StringProperty(indexed=True)
email = db.StringProperty(indexed=True, default=None)
def create(parent):
obj = my_model(user_id='x', email='y', parent=parent)
obj.put()

The answer to your question is: because the same ID could be in another entity but with a different parent.
The IDs will be all different with the same parent or for all entities without a parent, but if there is an ancestor then your numerical IDs are not unique.

parallel code execution python2.7 ndb

in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?

Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.

The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.

How to make references between expando models?

Update
This was my best effort creating the following scheme
user = self.auth.store.user_model.create_user(email,
password_raw=newpasswd)
if not user[0]: # user is a tuple
return user[1] # Error message
else:
# User is created, let's try making the references
okuser = auth_models.User.get_by_id(long(user[1].key.id()))
okuser.sponsor = auth_models.User.get_by_id(long(sponsor_id)).auth_ids
Original question
How can I make a selfreference with an expando class to indicate which User is the "sponsor" of which? The "sponsor" is the one who invited the new User so at creation we must store that and it would be much neater to store it as a referenceproperty than a string or a stringlist.
I can create a new user but I don't know how to make a reference so that I can tell for one User who another User is who is the sponsor of the first user and I suppose a way to model this is with selfreferenceproperty since both objects are users but the complication is that it is an expando model so I don't know how to use the reference property. Could you tell me how to do it or give me a clue how I can solve this problem in the best way?
user = self.auth.store.user_model.create_user(email,
password_raw=newpasswd)
if not user[0]: # user is a tuple
return user[1] # Error message
else:
# User is created, let's try making the reference
okuser = auth_models.User.get_by_id(user[1].key.id())
okuser.sponsor = db.SelfReferenceProperty(User,
collection_name='matched_images', verbose_name='Sponsor')
I don't know how to do the last part, to store the actual referenceproperty with an epando model. How can it be done?
Update
It seems it can't be done:
NotImplementedError: Property sponsor does not support <class 'google.appengine.ext.db.ReferenceProperty'> types.
Code:
user = self.auth.store.user_model.create_user(email,
password_raw=newpasswd)
if not user[0]: # user is a tuple
return user[1] # Error message
else:
# User is created, let's try redirecting to login page
okuser = auth_models.User.get_by_id(long(user[1].key.id()))
okuser.sponsor = db.SelfReferenceProperty(auth_models.User.get_by_id(sponsor_id),collection_name='matched_distributor')
okuser.put()
It forces me do use a string instead of a reference and then a solution is feasible:
user = self.auth.store.user_model.create_user(email,
password_raw=newpasswd)
if not user[0]: # user is a tuple
return user[1] # Error message
else:
# User is created, let's try redirecting to login page
okuser = auth_models.User.get_by_id(long(user[1].key.id()))
okuser.sponsor = sponsor_id
okuser.put()

You can't assign an instance of a Property class to an instance of a model - property classes define properties, they don't represent individual values.
By far the easiest way to do what you want is to add the property as you would on a regular model. Just because you're using expandos (why, by the way?) doesn't mean you can't have regular properties on them as well.

Develop Reference

c reactjs sql-server angularjs arrays wpf database batch-file google-app-engine silverlight

How to seed google's NDB (app engine storage) - google-app-engine

Related

Create a model in app1 when app2 loads

pattern for updating a datastore object

Why does db.Model.get_by_id() return None when no parent is specified?

parallel code execution python2.7 ndb

How to make references between expando models?

Categories

Resources