I'm implementing a frontpage with "hot" stories based on a certain ranking algorithm. However, I can't figure out how to pass App Engine Datastore my own sort function (like I can in Python with sort(key=ranking_function)). I want something like this:
class Story(db.Model):
user = db.ReferenceProperty(User)
text = db.TextProperty()
def ranking(self):
# my ranking function, returns an int or something
return 1
ranking = property(ranking_function)
So that I can later call:
Story.all().order("ranking").limit(50)
Any idea how to do this using App Engine Datastore models?
I don't think this is possible with App Engine the way you describe it, but I think it is possible to achieve what you want. You want the datastore to run your ranking function against every element in the datastore, every time you do a query. That is not very scalable, as you could have millions of entities that you want to rank.
Instead, you should just have a integer property called rank, and set it every time you update an entity. Then you can use that property in your order clause.
There's no built in property that handles this, but there's a library, aetycoon, that implements DerivedProperty and other related properties that do what you want. Here's an article on how it works.
Related
The documents in my apps collection each contain a subcollection of users. Now I need to update a single user per app given a set of _ids for the apps collection using javascript. I cannot use a regular call to update() for this, as the data inserted will be encrypted using a public key stored within the app document. Therefore the data written into the user-subdocument is dependant on the app-document it is contained in. Pseudo-code of what I need to do:
foreach app in apps:
app.users.$.encryptedData = encrypt(data, app.publicKey)
One way to do it would be to find all the apps and then use forEach() to update every single app. However, this seems to be quite inefficient to me, as all the app-documents would have to be found twice in the database, one time to gather all of them and then another time to update every single document. There has to be a more efficient way.
The short answer is that no, you can not update a document in mongoDB with a value from that document.
Have a look at
https://stackoverflow.com/a/37280419/5293110
for ideas other that doing the iteration yourself.
Basically, I would like to update about 10,000 entities at once — adding a new property and value to each entity.
Given this class:
Post(ndb.Model):
title = ndb.StringProperty()
created_date = ndb.DateTimeProperty()
I would like to run some sort of operation that would create this new created_date_string property in my existing Post entities and occupying the field with a string version of the date.
Post(ndb.Model):
title = ndb.StringProperty()
created_date = ndb.DateTimeProperty()
created_date_string = ndb.StringProperty(required=True)
How do I handle this?
My best guess is to use task queues to update each entity; we would be queueing 10,000 tasks. Is there a better approach?
You could do this in a single task where that task iterates over the entities to update them. You'll want to batch your gets and puts to make it more efficient. Tasks run for up to 10 minutes, and I bet this would take less than a minute.
Are you sure you need this new property? You could do this:
Post(ndb.Model):
title = ndb.StringProperty()
created_date = ndb.DateTimeProperty()
#property
def created_date_string(self):
return str(self.created_date)
UPDATE:
I should have explained the confusing terminology. There are two completely different uses of "property" here. The property in my answer is specific to Python and has nothing to do with GAE. The #property of Python makes a function look like a variable so can do x.created_date_string instead of x.created_date_string()
Instead of what I wrote above, you could do:
Post(ndb.Model):
title = ndb.StringProperty()
created_date = ndb.DateTimeProperty()
def created_date_string(self):
return str(self.created_date)
It is basically the exact same thing.
The Python property is different from a GAE computed property, which is an actual property in the datastore. You could use that as well, but why store redundant data if you don't need to.
You are storing the String version of created_date property in created_date_string. There are two use cases I can think of for doing this.
Using created_date_string on server only: If you are using this property on server side only, then there is no need to store it as it becomes redundant, you can calculate it via instance methods for model class.
Send created_date_string in API response: If you are sending this property via API and using it on client side(web/app etc.). Then the best option is to use the ComputedProperty of Google App Engine as shown below
created_date_string = ndb.ComputedProperty(lambda self: str(self.created_date))
this way your created_date_string property will always be consistent with created_date and will automatically be created and stored in DataStore.
You can find more info on ComputedProperty here
Comping back to your original question about how to update 10,000 entities. As it a one job task I would recommend using deffered. It also uses task queue but is comparatively easy to use. As mentioned in the definition:
The deferred library lets you bypass all the work of setting up dedicated task handlers and serializing and deserializing your parameters by exposing a simple function deferred.defer()
You can find the documentation Here. The example given itself is synonymous to what you are asking i.e. to run batch updates.
Here is how I would do it.
Write a dedicated handler(example: /runbatchupdate) that will start your update using deffered
Hit the handler from outside or make an entry in your cron.yaml to run this handler.
If you need a sample code then comment below and I will write a sample handler for you. Hope this helps
I'm trying to query for all objects that have no value for a given repeated property.
For example imagine you have the following model:
class Foo(ndb.Model):
bar = ndb.IntegerProperty(repeated=True)
and you wanted all the instances of Foo where bar had no value, or is []. How would you perform this query or work around this behavior?
Note (from GAE's ndb documentation):
Querying for a value of None on a repeated property has undefined
behavior; don't do that
Well, like the docs say, you can't.
One way of approaching this might be to keep another property on the model that records how many values it has in bar. You would need to update this when the entity is saved: a good way would be to override put() to do self.bar_count = len(self.bars) before calling the superclass method.
Of course, you'd then need to go through your existing data to set the counts; you might want to use a mapper to do that.
Google is proposing changing one entry at a time to the default values ....
http://code.google.com/appengine/articles/update_schema.html
I have a model with a million rows and doing this with a web browser will take me ages. Another option is to run this using task queues but this will cost me a lot of cpu time
any easy way to do this?
Because the datastore is schema-less, you do literally have to add or remove properties on each instance of the Model. Using Task Queues should use the exact same amount of CPU as doing it any other way, so go with that.
Before you go through all of that work, make sure that you really need to do it. As noted in the article that you link to, it is not the case that all entities of a particular model need to have the same set of properties. Why not change your Model class to check for the existence of new or removed properties and update the entity whenever you happen to be writing to it anyhow.
Instead of what the docs suggest, I would suggest to use low level GAE API to migrate.
The following code will migrate all the items of type DbMyModel:
new_attribute will be added if does not exits.
old_attribute will be deleted if exists.
changed_attribute will be converted from boolean to string (True to Priority 1, False to Priority 3)
Please note that query.Run returns iterator returning Entity objects. Entity objects behave simply like dicts:
from google.appengine.api.datastore import Query, Put
query = Query("DbMyModel")
for item in query.Run():
if not 'new_attribute' in item:
item['attribute'] = some_value
if 'old_attribute' in item:
del item['old_attribute']
if ['changed_attribute'] is True:
item['changed_attribute'] = 'Priority 1'
elif ['changed_attribute'] is False:
item['changed_attribute'] = 'Priority 3'
#and so on...
#Put the item to the db:
Put(item)
In case you need to select only some records, see the google.appengine.api.datastore module's source code for extensive documentation and examples how to create filtered query.
Using this approach it is simpler to remove/add properties and avoid issues when you have already updated your application model than in GAE's suggested approach.
For example, now-required fields might not exist (yet) causing errors while migrating. And deleting fields does not work for static properties.
This doesn't help OP but may help googlers with a tiny app: I did what Alex suggested, but simpler. Obviously this isn't appropriate for production apps.
deploy App Engine Console
write code right inside the web interpreter against your live datastore
like so:
from models import BlogPost
for item in BlogPost.all():
item.attr="defaultvalue"
item.put()
I need to save in my model a list of objects from a certain class on the datastore.
Is there any simple way to archive this with ListProperty and custom property's without going into pickled/simplejson blob data?
I just want something like this:
class Test:
pass
class model(db.Model):
list = db.ListProperty(Test)
Looking at GAE documentation I can't really tell if this is impossible with the current version or not.
I was trying to avoid pickling because it's slow and has size limits.
You can only store a limited set of types directly in the datastore. To store your own types, you need to convert them into one of the accepted types in some manner - pickling is one common approach, as is serializing it as JSON.
The size limit isn't unique to pickling - 1MB is the largest Entity you can insert regardless of the fields and types.
You could save your Test objects in the datastore directly, by making a Test model/entity type. Otherwise you will have to serialize them somehow (using something like pickle or json)
You could have a list of keys
or you could give 'Test' entities a parent that is a entity of your 'model' class