How to read old property values in a _pre_put_hook - google-app-engine

I am trying to implement an ndb model audit so that all changes to properties are stored within each model instance. Here is the code of the _pre_put_hook I chose to implement that.
def _pre_put_hook(self):
# save a history record for updates
if not (self.key is None or self.key.id() is None):
old_object = self.key.get(use_cache=True)
for attr in dir(self):
if not callable(getattr(self, attr)) and not attr.startswith("_"):
if getattr(self, attr) != getattr(old_object, attr):
logging.debug('UPDATE: {0}'.format(attr))
logging.debug('OLD: {0} NEW: {1}'.format(getattr(old_object, attr), getattr(self, attr)))
The problem is old_object is always populated with the same values of the self (object) being updated. How can I access the property values of the old object BEFORE the put() being actually made (_pre_put)?

EDIT:
I realized over time I was doing a bunch of work that didn't need to be done (alot of CPU/memory used copying entire entities and passing them around when may not be needed). Here's the updated version which stores a reference to the original protobuf and only deserializes it if you need it
__original = None # a shadow-copy of this object so we can see what changed... lazily inflated
_original_pb = None # the original encoded Protobuf representation of this entity
#property
def _original(self):
"""
Singleton to deserialize the protobuf into a new entity that looks like the original from database
"""
if not self.__original and self._original_pb:
self.__original = self.__class__._from_pb(self._original_pb)
return self.__original
#classmethod
def _from_pb(cls, pb, set_key=True, ent=None, key=None):
"""
save copy of original pb so we can track if anything changes between puts
"""
entity = super(ChangesetMixin, cls)._from_pb(pb, set_key=set_key, ent=ent, key=key)
if entity._original_pb is None and not entity._projection:
# _from_pb will get called if we unpickle a new object (like when passing through deferred library)
# so if we are being materialized from pb and we don't have a key, then we don't have _original
entity.__original = None
entity._original_pb = pb
return entity
Make a clone of the entity when you first read it:
Copy an entity in Google App Engine datastore in Python without knowing property names at 'compile' time
and put it on the entity itself so it can be referenced later when desired. That way you don't have to do a second datastore read just to make the comparison
We override two different Model methods to make this happen:
#classmethod
def _post_get_hook(cls, key, future):
"""
clone this entity so we can track if anything changes between puts
NOTE: this only gets called after a ndb.Key.get() ... NOT when loaded from a Query
see _from_pb override below to understand the full picture
also note: this gets called after EVERY key.get()... regardless if NDB had cached it already
so that's why we're only doing the clone() if _original is not set...
"""
entity = future.get_result()
if entity is not None and entity._original is None:
entity._original = clone(entity)
#classmethod
def _from_pb(cls, pb, set_key=True, ent=None, key=None):
"""
clone this entity so we can track if anything changes between puts
this is one way to know when an object loads from a datastore QUERY
_post_get_hook only gets called on direct Key.get()
none of the documented hooks are called after query results
SEE: https://code.google.com/p/appengine-ndb-experiment/issues/detail?id=211
"""
entity = super(BaseModel, cls)._from_pb(pb, set_key=set_key, ent=ent, key=key)
if entity.key and entity._original is None:
# _from_pb will get called if we unpickle a new object (like when passing through deferred library)
# so if we are being materialized from pb and we don't have a key, then we don't have _original
entity._original = clone(entity)
return entity

Related

How to migrate newly added python class property in ndb model?

I currently have a model in NDB and I'd like to add a new property to it. Let's say I have the following:
class User(Model, BaseModel):
name = ndb.StringProperty(required=False)
email = ndb.StringProperty(required=False)
#property
def user_roles(self):
return UserRole.query(ancestor=self.key).fetch()
#property
def roles(self):
return [user_role.role for user_role in UserRole.query(ancestor=self.key).fetch()]
Now, let's say, I've added one additional property called market_id. For example,
class User(Model, BaseModel):
name = ndb.StringProperty(required=False)
email = ndb.StringProperty(required=False)
#property
def user_roles(self):
return UserRole.query(ancestor=self.key).fetch()
#property
def roles(self):
return [user_role.role for user_role in UserRole.query(ancestor=self.key).fetch()]
#property
def market_id(self):
""" fetches `id` for the resource `market` associated with `user` """
for each_role in UserRole.query(ancestor=self.key):
resource = each_role.role.get().resource
if resource.kind() == 'Market':
return resource.id()
return None
The problem here is, roles are fetched properly as expected for all the existing entities (since that property had been there since the beginning and also, an extra column can be observed in datastore called roles).
Since, I'm dealing with Python class property, I assume that migration is not required. But, how does column called roles already exist? And why newly added property called market_id does not? Does it require migration?
The change you're suggesting is not an actual ndb model change as you're not adding/deleting/modifying any of the model's datastore properties. Only ndb.Property class children are real ndb model properties that are stored when the entity is put() into the datastore.
The property you're adding is a Python class #property - nothing to do with what's in the datastore.
So for this particular case no migration is needed.
The update to the question makes this even more clear, I believe. The market_id #property is not a User datastore entity property. To get values for it you don't need to update the User entity, but you have to create/edit corresponding UserRole entities with their resource property point to a Market entity.

Django, autogenerate one-to-many tables, and database structure

I am using django for a website where I have a database with users, people, locations, items and so on. Know i find that I need some extra information that requires one-to-many relations like Aliases for most of these tables.
Should I (1) create a common alias table for all of these by using the content type framework (will probably end up with billions of rows), or should I (2) create a alias table for each of these. If the latter one, how do I auto-create one-to-many table like this by just adding a single line like this
"alias = Ailias()"
in each model. I`m sure I saw an app doing something like that way a while ago, I think is was a reversion app of some kind. Even if the second method is not suited i would love tho understand how to do it. I do not know what to search after to find an explanation of this.
I plan to add Haystack with Solr to this, so method 2 might add much extra work there. But I do not have much experience with it jet, so I might be wrong.
PS: ended up wih method one.
Manage to do what I wanted in method 2, easily generate one-to-many fields. Not sure if this is the easiest way, or the best way. If someone has a better way of doing it, I would love to learn it. I am a long way from a django expert, so I might have meddled with some unnecessary complex stuff to do what I wanted.
This example creates an easy way of adding a one-to-many alias relationship.
Alias Managers
class AliasManagerDescriptor(object):
def __init__(self, model,fkName):
self.model = model
self.fkName = fkName
def __get__(self, instance, owner):
if instance is None:
return AliasManager(self.model,self.fkName)
return AliasManager(self.model, self.fkName, instance)
class AliasManager(models.Manager):
def __init__(self, model,fkName, instance=None):
super(AliasManager, self).__init__()
self.model = model
self.instance = instance
#Name of FK linking this model to linked model
self.fkName=fkName
def get_query_set(self):
"""
Get query set, or only get instances from this model that is linked
to the chosen instance from the linked model if one is chosen
"""
if self.instance is None:
return super(AliasManager, self).get_query_set()
if isinstance(self.instance._meta.pk, models.OneToOneField):
#TODO: Checkif this part works, not checked
filter = {self.instance._meta.pk.name+"_id":self.instance.pk}
else:
filter = {self.fkName: self.instance.pk}
return super(AliasManager, self).get_query_set().filter(**filter)
def create(self,**kwargs):
"""
Create alias instances. If FK is not given then it is automatically set
to the chosen instance from the linked model
"""
if self.fkName not in kwargs:
kwargs[self.fkName]=self.instance
print kwargs
super(AliasManager, self).create(**kwargs)
Alias Models
class Alias(object):
def contribute_to_class(self, cls, name):
self.manager_name = name
aliasModel = self.create_alias_model(cls)
descriptor = AliasManagerDescriptor(aliasModel,cls._meta.object_name.lower())
setattr(cls, self.manager_name, descriptor)
def create_alias_model(self, model):
"""
Creates a alias model to associate with the model provided.
"""
attrs = {
#'id': models.AutoField(primary_key=True),
"name": models.CharField(max_length=255),
#Not sure which to use of the two next methods
model._meta.object_name.lower(): models.ForeignKey(model),
#model._meta.object_name.lower(): AliasObjectDescriptor(model),
'__unicode__': lambda self: u'%s' % self.name,
'__module__': model.__module__
}
attrs.update(Meta=type('Meta', (), self.get_meta_options(model)))
name = '%s_alias' % model._meta.object_name
return type(name, (models.Model,), attrs)
def get_meta_options(self, model):
"""
Returns a dictionary of fields that will be added to
the Meta inner class.
"""
return {
}
"""class AliasObjectDescriptor(object):
def __init__(self, model):
self.model = model
def __get__(self, instance, owner):
values = (getattr(instance, f.attname) for f in self.model._meta.fields)
return self.model(*values)"""
Person Model - Only need to add "alias = Alias()" to a model to add a one-to-many alias field.
class Person(models.Model):
name = models.CharField(max_length=30,blank=True,null=True)
age = models.IntegerField(blank=True,null=True)
alias = Alias()
Now you I can do something like this:
per = Person(name="Per",age=99)
per.save()
per.alias.create(name="Mr.P")
per_alias = per.alias.all().values_list("name",flat=True)

How do I handle objects that are part of a Model object’s state, but don’t need separate db-level support?

In my Google App Engine app I have model objects that need to be stored. These objects are parameterized by various policy objects. For example, my Event class has a Privacy policy object which determines who can see, update, etc. There are various subclasses of PrivacyPolicy that behave differently. The Event consults its PrivacyPolicy object at various points.
class PrivacyPolicy(db.Model):
def can_see(self, event, user):
pass
class OwnerOnlyPolicy(PrivacyPolicy):
def can_see(self, event, user):
return user == event.owner
class GroupOnlyPolicy(PrivacyPolicy):
def can_see(self, event, user):
for grp in event.owner.groups()
if grp.is_member(user):
return True
return False
class OnlyCertainUsersPolicy(PrivacyPolicy):
def __init__(self, others):
self.others = others
def can_see(self, event, user):
return user in others
I could make my Event class use a ReferenceProperty to the PrivacyPolicy:
class Event(db.Model):
privacy: db.ReferenceProperty(PrivacyPolicy)
#…
The reason I don’t like this is that the one-to-one relationship means that nobody every queries for the policy object, there is no need to maintain the back-reference from the policy to its Event object, and in no other way is PrivacyPolicy an independent db-level object. It is functionally equivalent to an IntegerProperty, in that it is part of the Event object’s state, it’s just an object instead of a number — specifically it’s an object that can have zero state or lots of state, unknown to the Event type.
I can’t find anyone talking about how to approach such a situation. Is there a tool/approach I don’t know about? Do I just suck it up and use a reference property and the hell with the overhead?
If the only other way to handle this is a custom Property type, any advice about how to approach it would be welcome. My first thought is to use a TextProperty to store the string rep of the policy object (policy), decode it when needed, caching the result, and having any change to the policy object invalidate the cache and update the string rep.
You're overcomplicating by trying to store this in the datastore. This belongs in code rather than in the datastore.
The least complicated way would be:
class Event(db.Model):
privacy = db.IntegerProperty()
def can_see(self, user):
if self.privacy == PRIVACY_OWNER_ONLY:
return user == event.owner
else if self.privacy == PRIVACY_GROUP:
for grp in self.owner.groups()
if grp.is_member(user):
return True
return False
Sometimes all it takes is to think of the right approach. The solution is to introduce a new kind of property that uses pickle to store and retrieve values, such as that described in https://groups.google.com/forum/?fromgroups#!topic/google-appengine/bwMD0ZfRnJg
I wanted something slightly more sophisticated, because pickle isn’t always the answer, and anyway documentation is nice, so here is my ObjectReference type:
import pickle
from google.appengine.ext import db
class ObjectProperty(db.Property):
def __init__(self, object_type=None, verbose_name=None, to_store=pickle.dumps, from_store=pickle.loads, **kwds):
"""Initializes this Property with all the given options
All args are passed to the superclass. The ones used specifically by this class are described here. For
all other args, see base class method documentation for details.
Args:
object_type: If not None, all values assigned to the property must be either instances of this type or None
to_store: A function to use to convert a property value to a storable str representation. The default is
to use pickle.dumps()
from_store: A function to use to convert a storable str representation to a property value. The default is
to use pickle.loads()
"""
if object_type and not isinstance(object_type, type):
raise TypeError('object_type should be a type object')
kwds['indexed'] = False # It never makes sense to index pickled data
super(ObjectProperty, self).__init__(verbose_name, **kwds)
self.to_store = to_store
self.from_store = from_store
self.object_type = object_type
def get_value_for_datastore(self, model_instance):
"""Get value from property to send to datastore.
We retrieve the value of the attribute and return the result of invoking the to_store function on it
See base class method documentation for details.
"""
value = getattr(model_instance, self.name, None)
return self.to_store(value)
def make_value_from_datastore(self, rep):
"""Get value from datastore to assign to the property.
We take the value passed, convert it to str() and return the result of invoking the from_store function
on it. The Property class assigns this returned value to the property.
See base class method documentation for details.
"""
# It passes us a unicode, even though I returned a str, so this is required
rep = str(rep)
return self.from_store(rep)
def validate(self, value):
"""Validate reference.
Returns:
A valid value.
Raises:
BadValueError for the following reasons:
- Object not of correct type.
"""
value = super(ObjectProperty, self).validate(value)
if value is not None and not isinstance(value, self.object_type):
raise db.KindError('Property %s must be of type %s' % (self.name, self.object_type))
return value

App Engine multiple namespaces

Recently there's been some data structure changes in our app, and we decided to use namespaces to separate different versions of of the data, and a mapreduce task that converts old entities to the new format.
Now that's all fine, but we don't want to always isolate the entire data set we have. The biggest part of our data is stored in a kind that's pretty simple and doesn't need to change often. So we decided to use per-kind namespaces.
Something like:
class Author(ndb.model.Model):
ns = '2'
class Book(ndb.model.Model):
ns = '1'
So, when migrating to version 2, we don't need to convert all our data (and copy all 'Book' kinds to the other namespace), only entities of the 'Author' kind. Then, instead of defining the appengine_config.namespace_manager_default_namespace_for_request, we just the 'namespace' keyword arguments to our queries:
Author.query(namespace=Author.ns).get()
Question: how to store (i.e. put()) the different kinds using these different namespaces? Something like:
# Not an API
Author().put(namespace=Author.ns)
Of course, the above doesn't work. (Yes, I could ask the datastore for an avaliable key in that namespace, and then use that key to store the instance with, but it's an extra API call that I'd like to avoid.)
To solve a problem like this I wrote a decorator as follows:
MY_NS = 'abc'
def in_my_namespace(fn):
"""Decorator: Run the given function in the MY_NS namespace"""
from google.appengine.api import namespace_manager
#functools.wraps(fn)
def wrapper(*args, **kwargs):
orig_ns = namespace_manager.get_namespace()
namespace_manager.set_namespace(MY_NS)
try:
res = fn(*args, **kwargs)
finally: # always drop out of the NS on the way up.
namespace_manager.set_namespace(orig_ns)
return res
return wrapper
So I can simply write, for functions that ought to occur in a separate namespace:
#in_my_namespace
def foo():
Author().put() # put into `my` namespace
Of course, applying this to a system to get the results you desire is a bit beyond the scope of this, but I thought it might be helpful.
EDIT: Using a with context
Here's how to accomplish the above using a with context:
class namespace_of(object):
def __init__(self, namespace):
self.ns = namespace
def __enter__(self):
self.orig_ns = namespace_manager.get_namespace()
namespace_manager.set_namespace(self.ns)
def __exit__(self, type, value, traceback):
namespace_manager.set_namespace(self.orig_ns)
Then elsewhere:
with namespace_of("Hello World"):
Author().put() # put into the `Hello World` namespace
A Model instance will use the namespace you set with the namespace_manager[1] as you can see here: python/google/appengine/ext/db/init.py
What you could do is create a child class of Model which expects a class-level 'ns' attribute to be defined. This sub class then overrides put() and sets the namespace before calling original put and resets the namespace afterwards. Something like this:
'''
class MyModel(db.Model):
ns = None
def put(*args, **kwargs):
if self.ns == None:
raise ValueError('"ns" is not defined for this class.')
original_namespace = namespace_manager.get_namespace()
try:
super(MyModelClass, self).put(*args, **kwargs)
finally:
namespace_manager.set_namespace(original_namespace)
'''
[1] http://code.google.com/appengine/docs/python/multitenancy/multitenancy.html
I don't think that it is possible to avoid the extra API call. Namespaces are encoded into the entity's Key, so in order to change the namespace within which a entity is stored, you need to create a new entity (that has a Key with the new namespace) and copy the old entity's data into it.

Google App Engine: while saving an entity that has reference properties, is it the app's responsibility to maintain referential integrity?

while saving an entity that has reference properties, is it the app's responsibility to check that the entities referred to by the reference properties already exist in the datastore? While unit testing with the datastore_v3_stub, I find that app engine will happily save a entity A that has a reference property B associated with it (and B does not exist in the datastore yet). further when A is saved, B is not saved.
When you subsequently fetch A from the datastore, and try to navigate to B, you get an exception.
Is this expected behavior?
Example code:
user = MyUser(key_name='2',
name='my user 2')
e = db.get(user.key())
self.assertTrue(e is None) # user does not exist in datastore yet
preferences = Preferences(user=user) # user is a ReferenceProperty
preferences.put()
e = db.get(user.key())
self.assertTrue(e is None) # user still does not exist in datastore
e = db.get(preferences.key())
self.assertFalse(e is None) # but preferences were still stored
e.user will give exception
EDIT: I am a python newbie, but is it possible to write a class that subclasses db.Model and overrides the put method to enforce referential integrity (by using some kind of reflection) before calling put of db.Model. Then I can just subclass this class to enforce referential integrity on my model classes (A, B above for example)
this is what i came up with. can any gurus code-review this:
for name, property in obj.properties().items():
if isinstance(property, db.ReferenceProperty):
try:
value = getattr(essay, name)
except datastore_errors.Error:
print name, property, 'does not exist in datastore yet'
continue
key = value.key()
o = db.get(key)
if o is None:
print name, property, value, 'does not exist in datastore yet'
Yes. It is the responsibility of the app.
To quote:
As with a Key value, it is possible for a reference property value to refer to a data entity that does not exist. If a referenced entity is deleted from the datastore, references to the entity are not updated. Accessing an entity that does not exist raises a ReferencePropertyResolveError.
taken from the docs

Resources