Recently there's been some data structure changes in our app, and we decided to use namespaces to separate different versions of of the data, and a mapreduce task that converts old entities to the new format.
Now that's all fine, but we don't want to always isolate the entire data set we have. The biggest part of our data is stored in a kind that's pretty simple and doesn't need to change often. So we decided to use per-kind namespaces.
Something like:
class Author(ndb.model.Model):
ns = '2'
class Book(ndb.model.Model):
ns = '1'
So, when migrating to version 2, we don't need to convert all our data (and copy all 'Book' kinds to the other namespace), only entities of the 'Author' kind. Then, instead of defining the appengine_config.namespace_manager_default_namespace_for_request, we just the 'namespace' keyword arguments to our queries:
Author.query(namespace=Author.ns).get()
Question: how to store (i.e. put()) the different kinds using these different namespaces? Something like:
# Not an API
Author().put(namespace=Author.ns)
Of course, the above doesn't work. (Yes, I could ask the datastore for an avaliable key in that namespace, and then use that key to store the instance with, but it's an extra API call that I'd like to avoid.)
To solve a problem like this I wrote a decorator as follows:
MY_NS = 'abc'
def in_my_namespace(fn):
"""Decorator: Run the given function in the MY_NS namespace"""
from google.appengine.api import namespace_manager
#functools.wraps(fn)
def wrapper(*args, **kwargs):
orig_ns = namespace_manager.get_namespace()
namespace_manager.set_namespace(MY_NS)
try:
res = fn(*args, **kwargs)
finally: # always drop out of the NS on the way up.
namespace_manager.set_namespace(orig_ns)
return res
return wrapper
So I can simply write, for functions that ought to occur in a separate namespace:
#in_my_namespace
def foo():
Author().put() # put into `my` namespace
Of course, applying this to a system to get the results you desire is a bit beyond the scope of this, but I thought it might be helpful.
EDIT: Using a with context
Here's how to accomplish the above using a with context:
class namespace_of(object):
def __init__(self, namespace):
self.ns = namespace
def __enter__(self):
self.orig_ns = namespace_manager.get_namespace()
namespace_manager.set_namespace(self.ns)
def __exit__(self, type, value, traceback):
namespace_manager.set_namespace(self.orig_ns)
Then elsewhere:
with namespace_of("Hello World"):
Author().put() # put into the `Hello World` namespace
A Model instance will use the namespace you set with the namespace_manager[1] as you can see here: python/google/appengine/ext/db/init.py
What you could do is create a child class of Model which expects a class-level 'ns' attribute to be defined. This sub class then overrides put() and sets the namespace before calling original put and resets the namespace afterwards. Something like this:
'''
class MyModel(db.Model):
ns = None
def put(*args, **kwargs):
if self.ns == None:
raise ValueError('"ns" is not defined for this class.')
original_namespace = namespace_manager.get_namespace()
try:
super(MyModelClass, self).put(*args, **kwargs)
finally:
namespace_manager.set_namespace(original_namespace)
'''
[1] http://code.google.com/appengine/docs/python/multitenancy/multitenancy.html
I don't think that it is possible to avoid the extra API call. Namespaces are encoded into the entity's Key, so in order to change the namespace within which a entity is stored, you need to create a new entity (that has a Key with the new namespace) and copy the old entity's data into it.
Related
I recently encountered a situation where one might want to run a datastore query which includes a kind, but the class of the corresponding model is not available (e.g. if it's defined in a module that hasn't been imported yet).
I couldn't find any out-of-the-box way to do this using the google.appengine.ext.db package, so I ended up using the google.appengine.api.datastore.Query class from the low-level datastore API.
This worked fine for my needs (my query only needed to count the number of results, without returning any model instances), but I was wondering if anyone knows of a better solution.
Another approach I've tried (which also worked) was subclassing db.GqlQuery to bypass its constructor. This might not be the cleanest solution, but if anyone is interested, here is the code:
import logging
from google.appengine.ext import db, gql
class ClasslessGqlQuery(db.GqlQuery):
"""
This subclass of :class:`db.GqlQuery` uses a modified version of ``db.GqlQuery``'s constructor to suppress any
:class:`db.KindError` that might be raised by ``db.class_for_kind(kindName)``.
This allows using the functionality :class:`db.GqlQuery` without requiring that a Model class for the query's kind
be available in the local environment, which could happen if a module defining that class hasn't been imported yet.
In that case, no validation of the Model's properties will be performed (will not check whether they're not indexed),
but otherwise, this class should work the same as :class:`db.GqlQuery`.
"""
def __init__(self, query_string, *args, **kwds):
"""
**NOTE**: this is a modified version of :class:`db.GqlQuery`'s constructor, suppressing any :class:`db.KindError`s
that might be raised by ``db.class_for_kind(kindName)``.
In that case, no validation of the Model's properties will be performed (will not check whether they're not indexed),
but otherwise, this class should work the same as :class:`db.GqlQuery`.
Args:
query_string: Properly formatted GQL query string.
*args: Positional arguments used to bind numeric references in the query.
**kwds: Dictionary-based arguments for named references.
Raises:
PropertyError if the query filters or sorts on a property that's not indexed.
"""
from google.appengine.ext import gql
app = kwds.pop('_app', None)
namespace = None
if isinstance(app, tuple):
if len(app) != 2:
raise db.BadArgumentError('_app must have 2 values if type is tuple.')
app, namespace = app
self._proto_query = gql.GQL(query_string, _app=app, namespace=namespace)
kind = self._proto_query._kind
model_class = None
try:
if kind is not None:
model_class = db.class_for_kind(kind)
except db.KindError, e:
logging.warning("%s on %s without a model class", self.__class__.__name__, kind, exc_info=True)
super(db.GqlQuery, self).__init__(model_class)
if model_class is not None:
for property, unused in (self._proto_query.filters().keys() +
self._proto_query.orderings()):
if property in model_class._unindexed_properties:
raise db.PropertyError('Property \'%s\' is not indexed' % property)
self.bind(*args, **kwds)
(also available as a gist)
You could create a temporary class just to do the query. If you use an Expando model, the properties of the class don't need to match what is actually in the datastore.
class KindName(ndb.Expando):
pass
You could then do:
KindName.query()
If you need to filter on specific properties, then I suspect you'll have to add them to the temporary class.
I am trying to implement an ndb model audit so that all changes to properties are stored within each model instance. Here is the code of the _pre_put_hook I chose to implement that.
def _pre_put_hook(self):
# save a history record for updates
if not (self.key is None or self.key.id() is None):
old_object = self.key.get(use_cache=True)
for attr in dir(self):
if not callable(getattr(self, attr)) and not attr.startswith("_"):
if getattr(self, attr) != getattr(old_object, attr):
logging.debug('UPDATE: {0}'.format(attr))
logging.debug('OLD: {0} NEW: {1}'.format(getattr(old_object, attr), getattr(self, attr)))
The problem is old_object is always populated with the same values of the self (object) being updated. How can I access the property values of the old object BEFORE the put() being actually made (_pre_put)?
EDIT:
I realized over time I was doing a bunch of work that didn't need to be done (alot of CPU/memory used copying entire entities and passing them around when may not be needed). Here's the updated version which stores a reference to the original protobuf and only deserializes it if you need it
__original = None # a shadow-copy of this object so we can see what changed... lazily inflated
_original_pb = None # the original encoded Protobuf representation of this entity
#property
def _original(self):
"""
Singleton to deserialize the protobuf into a new entity that looks like the original from database
"""
if not self.__original and self._original_pb:
self.__original = self.__class__._from_pb(self._original_pb)
return self.__original
#classmethod
def _from_pb(cls, pb, set_key=True, ent=None, key=None):
"""
save copy of original pb so we can track if anything changes between puts
"""
entity = super(ChangesetMixin, cls)._from_pb(pb, set_key=set_key, ent=ent, key=key)
if entity._original_pb is None and not entity._projection:
# _from_pb will get called if we unpickle a new object (like when passing through deferred library)
# so if we are being materialized from pb and we don't have a key, then we don't have _original
entity.__original = None
entity._original_pb = pb
return entity
Make a clone of the entity when you first read it:
Copy an entity in Google App Engine datastore in Python without knowing property names at 'compile' time
and put it on the entity itself so it can be referenced later when desired. That way you don't have to do a second datastore read just to make the comparison
We override two different Model methods to make this happen:
#classmethod
def _post_get_hook(cls, key, future):
"""
clone this entity so we can track if anything changes between puts
NOTE: this only gets called after a ndb.Key.get() ... NOT when loaded from a Query
see _from_pb override below to understand the full picture
also note: this gets called after EVERY key.get()... regardless if NDB had cached it already
so that's why we're only doing the clone() if _original is not set...
"""
entity = future.get_result()
if entity is not None and entity._original is None:
entity._original = clone(entity)
#classmethod
def _from_pb(cls, pb, set_key=True, ent=None, key=None):
"""
clone this entity so we can track if anything changes between puts
this is one way to know when an object loads from a datastore QUERY
_post_get_hook only gets called on direct Key.get()
none of the documented hooks are called after query results
SEE: https://code.google.com/p/appengine-ndb-experiment/issues/detail?id=211
"""
entity = super(BaseModel, cls)._from_pb(pb, set_key=set_key, ent=ent, key=key)
if entity.key and entity._original is None:
# _from_pb will get called if we unpickle a new object (like when passing through deferred library)
# so if we are being materialized from pb and we don't have a key, then we don't have _original
entity._original = clone(entity)
return entity
I am using django for a website where I have a database with users, people, locations, items and so on. Know i find that I need some extra information that requires one-to-many relations like Aliases for most of these tables.
Should I (1) create a common alias table for all of these by using the content type framework (will probably end up with billions of rows), or should I (2) create a alias table for each of these. If the latter one, how do I auto-create one-to-many table like this by just adding a single line like this
"alias = Ailias()"
in each model. I`m sure I saw an app doing something like that way a while ago, I think is was a reversion app of some kind. Even if the second method is not suited i would love tho understand how to do it. I do not know what to search after to find an explanation of this.
I plan to add Haystack with Solr to this, so method 2 might add much extra work there. But I do not have much experience with it jet, so I might be wrong.
PS: ended up wih method one.
Manage to do what I wanted in method 2, easily generate one-to-many fields. Not sure if this is the easiest way, or the best way. If someone has a better way of doing it, I would love to learn it. I am a long way from a django expert, so I might have meddled with some unnecessary complex stuff to do what I wanted.
This example creates an easy way of adding a one-to-many alias relationship.
Alias Managers
class AliasManagerDescriptor(object):
def __init__(self, model,fkName):
self.model = model
self.fkName = fkName
def __get__(self, instance, owner):
if instance is None:
return AliasManager(self.model,self.fkName)
return AliasManager(self.model, self.fkName, instance)
class AliasManager(models.Manager):
def __init__(self, model,fkName, instance=None):
super(AliasManager, self).__init__()
self.model = model
self.instance = instance
#Name of FK linking this model to linked model
self.fkName=fkName
def get_query_set(self):
"""
Get query set, or only get instances from this model that is linked
to the chosen instance from the linked model if one is chosen
"""
if self.instance is None:
return super(AliasManager, self).get_query_set()
if isinstance(self.instance._meta.pk, models.OneToOneField):
#TODO: Checkif this part works, not checked
filter = {self.instance._meta.pk.name+"_id":self.instance.pk}
else:
filter = {self.fkName: self.instance.pk}
return super(AliasManager, self).get_query_set().filter(**filter)
def create(self,**kwargs):
"""
Create alias instances. If FK is not given then it is automatically set
to the chosen instance from the linked model
"""
if self.fkName not in kwargs:
kwargs[self.fkName]=self.instance
print kwargs
super(AliasManager, self).create(**kwargs)
Alias Models
class Alias(object):
def contribute_to_class(self, cls, name):
self.manager_name = name
aliasModel = self.create_alias_model(cls)
descriptor = AliasManagerDescriptor(aliasModel,cls._meta.object_name.lower())
setattr(cls, self.manager_name, descriptor)
def create_alias_model(self, model):
"""
Creates a alias model to associate with the model provided.
"""
attrs = {
#'id': models.AutoField(primary_key=True),
"name": models.CharField(max_length=255),
#Not sure which to use of the two next methods
model._meta.object_name.lower(): models.ForeignKey(model),
#model._meta.object_name.lower(): AliasObjectDescriptor(model),
'__unicode__': lambda self: u'%s' % self.name,
'__module__': model.__module__
}
attrs.update(Meta=type('Meta', (), self.get_meta_options(model)))
name = '%s_alias' % model._meta.object_name
return type(name, (models.Model,), attrs)
def get_meta_options(self, model):
"""
Returns a dictionary of fields that will be added to
the Meta inner class.
"""
return {
}
"""class AliasObjectDescriptor(object):
def __init__(self, model):
self.model = model
def __get__(self, instance, owner):
values = (getattr(instance, f.attname) for f in self.model._meta.fields)
return self.model(*values)"""
Person Model - Only need to add "alias = Alias()" to a model to add a one-to-many alias field.
class Person(models.Model):
name = models.CharField(max_length=30,blank=True,null=True)
age = models.IntegerField(blank=True,null=True)
alias = Alias()
Now you I can do something like this:
per = Person(name="Per",age=99)
per.save()
per.alias.create(name="Mr.P")
per_alias = per.alias.all().values_list("name",flat=True)
In my Google App Engine app I have model objects that need to be stored. These objects are parameterized by various policy objects. For example, my Event class has a Privacy policy object which determines who can see, update, etc. There are various subclasses of PrivacyPolicy that behave differently. The Event consults its PrivacyPolicy object at various points.
class PrivacyPolicy(db.Model):
def can_see(self, event, user):
pass
class OwnerOnlyPolicy(PrivacyPolicy):
def can_see(self, event, user):
return user == event.owner
class GroupOnlyPolicy(PrivacyPolicy):
def can_see(self, event, user):
for grp in event.owner.groups()
if grp.is_member(user):
return True
return False
class OnlyCertainUsersPolicy(PrivacyPolicy):
def __init__(self, others):
self.others = others
def can_see(self, event, user):
return user in others
I could make my Event class use a ReferenceProperty to the PrivacyPolicy:
class Event(db.Model):
privacy: db.ReferenceProperty(PrivacyPolicy)
#…
The reason I don’t like this is that the one-to-one relationship means that nobody every queries for the policy object, there is no need to maintain the back-reference from the policy to its Event object, and in no other way is PrivacyPolicy an independent db-level object. It is functionally equivalent to an IntegerProperty, in that it is part of the Event object’s state, it’s just an object instead of a number — specifically it’s an object that can have zero state or lots of state, unknown to the Event type.
I can’t find anyone talking about how to approach such a situation. Is there a tool/approach I don’t know about? Do I just suck it up and use a reference property and the hell with the overhead?
If the only other way to handle this is a custom Property type, any advice about how to approach it would be welcome. My first thought is to use a TextProperty to store the string rep of the policy object (policy), decode it when needed, caching the result, and having any change to the policy object invalidate the cache and update the string rep.
You're overcomplicating by trying to store this in the datastore. This belongs in code rather than in the datastore.
The least complicated way would be:
class Event(db.Model):
privacy = db.IntegerProperty()
def can_see(self, user):
if self.privacy == PRIVACY_OWNER_ONLY:
return user == event.owner
else if self.privacy == PRIVACY_GROUP:
for grp in self.owner.groups()
if grp.is_member(user):
return True
return False
Sometimes all it takes is to think of the right approach. The solution is to introduce a new kind of property that uses pickle to store and retrieve values, such as that described in https://groups.google.com/forum/?fromgroups#!topic/google-appengine/bwMD0ZfRnJg
I wanted something slightly more sophisticated, because pickle isn’t always the answer, and anyway documentation is nice, so here is my ObjectReference type:
import pickle
from google.appengine.ext import db
class ObjectProperty(db.Property):
def __init__(self, object_type=None, verbose_name=None, to_store=pickle.dumps, from_store=pickle.loads, **kwds):
"""Initializes this Property with all the given options
All args are passed to the superclass. The ones used specifically by this class are described here. For
all other args, see base class method documentation for details.
Args:
object_type: If not None, all values assigned to the property must be either instances of this type or None
to_store: A function to use to convert a property value to a storable str representation. The default is
to use pickle.dumps()
from_store: A function to use to convert a storable str representation to a property value. The default is
to use pickle.loads()
"""
if object_type and not isinstance(object_type, type):
raise TypeError('object_type should be a type object')
kwds['indexed'] = False # It never makes sense to index pickled data
super(ObjectProperty, self).__init__(verbose_name, **kwds)
self.to_store = to_store
self.from_store = from_store
self.object_type = object_type
def get_value_for_datastore(self, model_instance):
"""Get value from property to send to datastore.
We retrieve the value of the attribute and return the result of invoking the to_store function on it
See base class method documentation for details.
"""
value = getattr(model_instance, self.name, None)
return self.to_store(value)
def make_value_from_datastore(self, rep):
"""Get value from datastore to assign to the property.
We take the value passed, convert it to str() and return the result of invoking the from_store function
on it. The Property class assigns this returned value to the property.
See base class method documentation for details.
"""
# It passes us a unicode, even though I returned a str, so this is required
rep = str(rep)
return self.from_store(rep)
def validate(self, value):
"""Validate reference.
Returns:
A valid value.
Raises:
BadValueError for the following reasons:
- Object not of correct type.
"""
value = super(ObjectProperty, self).validate(value)
if value is not None and not isinstance(value, self.object_type):
raise db.KindError('Property %s must be of type %s' % (self.name, self.object_type))
return value
I am using Polymorphic Models.
Simple Question: My code below works without using this line below, which I see in other people's code. What is it supposed to do?
#super(GeneralModel, self).__init__(*args, **kwargs)
Messy Question: I have a feeling my code below, although it seems to work, is not the most beautiful solution.
Synopsis of what I am doing: I am instantiating (or making) a new datastore model entity based on a 'unclean' JSON object posted to the server. 1st I want to do some general input data cleaning specified in the general (or super) model and then 2nd do some special methods, which is specified in each special (or sub-class) model as def parse.
class GeneralModel(polymodel.PolyModel):
lat_long_list = db.ListProperty(db.GeoPt)
zooms = db.ListProperty(int)
def __init__(self, *args, **kwargs):
self.lat_long_list = [ db.GeoPt( pt[0] , pt[1] ) for pt in zip( kwargs["lat"] , kwargs["lon"] ) ]
del kwargs["lat"]
del kwargs["lon"]
if "zooms" not in kwargs: kwargs["zooms"] = ZOOMS # some default
for property,value in kwargs.items():
setattr(self,property,value)
#super(NamedModel, self).__init__(*args, **kwargs)
self.parse()
def parse(self):
raise NotImplementedError('Need to define this for each category')
class SpecialModel(GeneralModel):
stringText = db.StringProperty()
words_list = db.StringListProperty()
def parse( self ):
self.words_list = self.stringText.split(",")
This is how I test whether my code works:
>>>kwargs={'stringText':'boris,ted','lat':[0,1,2,3],'lon':[0,1,2,8],'zooms':[0,10]}
>>>entity=SpecialModel(key_name="tester",**kwargs)
>>>entity.words_list
['boris', 'ted']
The 'super' line calls the constructor of the parent entity. If you don't include it, the parent constructor will not be called, and your model will not be initialized properly. You should, in fact, be calling this first, before any of your own initialization.
However, overriding the constructor on models is strongly discouraged. The constructor is not just used when you call it, but also by the system to construct instances that are being loaded from the datastore, and in the latter case, the arguments - and the expected behaviour - are different, and implementation dependent.
Instead, you should probably define a factory method, like so:
class MyModel(db.PolyModel):
#classmethod
def create(cls, foo, bar):
# Do some stuff
return cls(foo, bleh)