Idiomatic List Wrapper - google-app-engine

In Google App Engine, I make lists of referenced properties much like this:
class Referenced(BaseModel):
name = db.StringProperty()
class Thing(BaseModel):
foo_keys = db.ListProperty(db.Key)
def __getattr__(self, attrname):
if attrname == 'foos':
return Referenced.get(self.foo_keys)
else:
return BaseModel.__getattr__(self, attrname)
This way, someone can have a Thing and say thing.foos and get something legitimate out of it. The problem comes when somebody says thing.foos.append(x). This will not save the added property because the underlying list of keys remains unchanged. So I quickly wrote this solution to make it easy to append keys to a list:
class KeyBackedList(list):
def __init__(self, key_class, key_list):
list.__init__(self, key_class.get(key_list))
self.key_class = key_class
self.key_list = key_list
def append(self, value):
self.key_list.append(value.key())
list.append(self, value)
class Thing(BaseModel):
foo_keys = db.ListProperty(db.Key)
def __getattr__(self, attrname):
if attrname == 'foos':
return KeyBackedList(Thing, self.foo_keys)
else:
return BaseModel.__getattr__(self, attrname)
This is great for proof-of-concept, in that it works exactly as expected when calling append. However, I would never give this to other people, since they might mutate the list in other ways (thing[1:9] = whatevs or thing.sort()). Sure, I could go define all the __setslice__ and whatnot, but that seems to leave me open for obnoxious bugs. However, that is the best solution I can come up with.
Is there a better way to do what I am trying to do (something in the Python library perhaps)? Or am I going about this the wrong way and trying to make things too smooth?

If you want to modify things like this, you shouldn't be changing __getattr__ on the model; instead, you should write a custom Property class.
As you've observed, though, creating a workable 'ReferenceListProperty' is difficult and involved, and there are many subtle edge cases. I would recommend sticking with the list of keys, and fetching the referenced entities in your code when needed.

Related

Grails/GORM domain saving - transient object workaround

I found a work around to a problem I had, and I want to know if it is valid or not. It is a similar problem to: Grails Gorm : Object references an unsaved transient instance
Lets assume I have two domain Objects (names changed to protect the guilty).
public class Shelf {
String name
Set<Book> books = [] as Set
static hasMany = [books: Book]
}
and
public class Book {
String title
Shelf shelf
}
So this means that 1 Shelf contains 0 to many Books, and one Book can be on only one Shelf.
This Shelf is very large. And at some point, it contains 80,000 Books. All stored nicely in the DB. Of course, adding new Books is getting slower and slower.
This is done by:
Book book1 = new Book("Awesome Title")
existingShelf.addToBooks(book1)
existingShelf.save(flush: true) // super slow
This is slow. Mainly (I assume) because GORM has to confirm the other 80,000 records.
So I did this to try to work around the slow point.
Book book2 = new Book("Awesome Title 2")
book2.save(flush: true)
This gives me an "Object references an unsaved transient instance", which I guess makes sense - the "shelf" value is empty.
So I did something a little weird:
Book book3 = new Book("Awesome Title 3")
book3.shelf = new Shelf()
book3.shelf.id = <known/valid id here>
book2.save(flush: true)
This works. It saves. There are no referential errors. Further code that depends on this... works.
I just made a call that last minutes and reduced it down to seconds.
But that seems too easy. I'm sure I worked around Grails magic some how. And probably broke something in the process.
Advice? Explanations?
Yes, using addTo* methods can be slow. If you look at the generated SQL you'll understand why. Doing the following:
new Book(title: "GORM Performance", shelf: grailsShelf).save()
will be faster and there is technically nothing wrong with it. Just be aware of that your instance of grailsShelf.books will not contain the new book until you've refreshed the collection from the database. This is part of what the addTo* method does for you.
Side note:
Set<Book> books = [] as Set
is unnecessary.

Retrieving Specific Active Directory Record Attributes using C#

I've been asked to set up a process which monitors the active directory, specifically certain accounts, to check that they are not locked so that should this happen, the support team can get an early warning.
I've found some code to get me started which basically sets up requests and adds them to a notification queue. This event is then assigned to a change event and has an ObjectChangedEventArgs object passed to it.
Currently, it iterates through the attributes and writes them to a text file, as so:
private static void NotifierObjectChanged(object sender,
ObjectChangedEventArgs e)
{
if (e.ResultEntry.Attributes.AttributeNames == null)
{
return;
}
// write the data for the user to a text file...
using (var file = new StreamWriter(#"C:\Temp\UserDataLog.txt", true))
{
file.WriteLine("{0} {1}", DateTime.UtcNow.ToShortDateString(), DateTime.UtcNow.ToShortTimeString());
foreach (string attrib in e.ResultEntry.Attributes.AttributeNames)
{
foreach (object item in e.ResultEntry.Attributes[attrib].GetValues(typeof(string)))
{
file.WriteLine("{0}: {1}", attrib, item);
}
}
}
}
What I'd like is to check the object and if a specific field, such as name, is a specific value, then check to see if the IsAccountLocked attribute is True, otherwise skip the record and wait until the next notification comes in. I'm struggling how to access specific attributes of the ResultEntry without having to iterate through them all.
I hope this makes sense - please ask if I can provide any additional information.
Thanks
Martin
This could get gnarly depending upon your exact business requirements. If you want to talk in more detail ping me offline and I'm happy to help over email/phone/IM.
So the first thing I'd note is that depending upon what the query looks like before this, this could be quite expensive or error prone (ie missing results). This worries me somewhat as most sample code out there gets this wrong. :) How are you getting things that have changed? While this sounds simple, this is actually a somewhat tricky question in directory land, given the semantics supported by AD and the fact that it is a multi-master system where writes happen all over the place (and replicate in after the fact).
Other variables would be things like how often you're going to run this, how large the data set could be in AD, and so on.
AD has some APIs built to help you here (the big one that comes to mind is called DirSync) but this can be somewhat complicated if you haven't used it before. This is where the "ping me offline" part comes in.
To your exact question, I'm assuming your result is actually a SearchResultEntry (if not I can revise, tell me what you have in hand). If that is the case then you'll find an Attributes field hanging off of that guy, and from there there is AttributeNames and Values. I think you'll see how it works from there if you have Values in hand, for example:
foreach (var attr in sre.Attributes.Values)
{
var da = (DirectoryAttribute)attr;
Console.WriteLine(da.Name);
foreach (var val in da.GetValues(typeof(byte[])))
{
// Handle a byte[] val ...
}
}
As I said, if you have something other than a SearchResultEntry in hand, let us know and I can revise the code sample.

parallel code execution python2.7 ndb

in my app i for one of the handler i need to get a bunch of entities and execute a function for each one of them.
i have the keys of all the enities i need. after fetching them i need to execute 1 or 2 instance methods for each one of them and this slows my app down quite a bit. doing this for 100 entities takes around 10 seconds which is way to slow.
im trying to find a way to get the entities and execute those functions in parallel to save time but im not really sure which way is the best.
i tried the _post_get_hook but the i have a future object and need to call get_result() and execute the function in the hook which works kind of ok in the sdk but gets a lot of 'maximum recursion depth exceeded while calling a Python objec' but i can't really undestand why and the error message is not really elaborate.
is the Pipeline api or ndb.Tasklets what im searching for?
atm im going by trial and error but i would be happy if someone could lead me to the right direction.
EDIT
my code is something similar to a filesystem, every folder contains other folders and files. The path of the Collections set on another entity so to serialize a collection entity i need to get the referenced entity and get the path. On a Collection the serialized_assets() function is slower the more entities it contains. If i could execute a serialize function for each contained asset side by side it would speed things up quite a bit.
class Index(ndb.Model):
path = ndb.StringProperty()
class Folder(ndb.Model):
label = ndb.StringProperty()
index = ndb.KeyProperty()
# contents is a list of keys of contaied Folders and Files
contents = ndb.StringProperty(repeated=True)
def serialized_assets(self):
assets = ndb.get_multi(self.contents)
serialized_assets = []
for a in assets:
kind = a._get_kind()
assetdict = a.to_dict()
if kind == 'Collection':
assetdict['path'] = asset.path
# other operations ...
elif kind == 'File':
assetdict['another_prop'] = asset.another_property
# ...
serialized_assets.append(assetdict)
return serialized_assets
#property
def path(self):
return self.index.get().path
class File(ndb.Model):
filename = ndb.StringProperty()
# other properties....
#property
def another_property(self):
# compute something here
return computed_property
EDIT2:
#ndb.tasklet
def serialized_assets(self, keys=None):
assets = yield ndb.get_multi_async(keys)
raise ndb.Return([asset.serialized for asset in assets])
is this tasklet code ok?
Since most of the execution time of your functions are spent waiting for RPCs, NDB's async and tasklet support is your best bet. That's described in some detail here. The simplest usage for your requirements is probably to use the ndb.map function, like this (from the docs):
#ndb.tasklet
def callback(msg):
acct = yield ndb.get_async(msg.author)
raise tasklet.Return('On %s, %s wrote:\n%s' % (msg.when, acct.nick(), msg.body))
qry = Messages.query().order(-Message.when)
outputs = qry.map(callback, limit=20)
for output in outputs:
print output
The callback function is called for each entity returned by the query, and it can do whatever operations it needs (using _async methods and yield to do them asynchronously), returning the result when it's done. Because the callback is a tasklet, and uses yield to make the asynchronous calls, NDB can run multiple instances of it in parallel, and even batch up some operations.
The pipeline API is overkill for what you want to do. Is there any reason why you couldn't just use a taskqueue?
Use the initial request to get all of the entity keys, and then enqueue a task for each key having the task execute the 2 functions per-entity. The concurrency will be based then on the number of concurrent requests as configured for that taskqueue.

Retrieving db.Property choices

Searching on the documentation provided by google and browsing SO I haven't found a way to retrieve the choices set on a db.Property object (I want to retrieve it in order to create forms based on the model).
I'm using the following recipe to do what I need, Is this correct? Is there any other way of doing it? (simpler, more elegant, more pythonic, etc.)
For a model like this:
class PhoneNumber(db.Model):
contact = db.ReferenceProperty(Contact,
collection_name='phone_numbers')
phone_type = db.StringProperty(choices=('home', 'work'))
number = db.PhoneNumberProperty()
I do the following modification:
class PhoneNumber(db.Model):
_phone_types = ('home', 'work')
contact = db.ReferenceProperty(Contact,
collection_name='phone_numbers')
phone_type = db.StringProperty(choices=_phone_types)
number = db.PhoneNumberProperty()
#classmethod
def get_phone_types(self):
return self._phone_types
You should be able to use PhoneNumber.phone_type.choices. If you want you could make that into a class method too:
#classmethod
def get_phone_types(class_):
return class_.phone_type.choices
You can decide if you prefer the class method approach or not.
Don't forget about Python's dir built-in! It is very useful when exploring objects.

How do I dynamically determine if a Model class exist in Google App Engine?

I want to be able to take a dynamically created string, say "Pigeon" and determine at runtime whether Google App Engine has a Model class defined in this project named "Pigeon". If "Pigeon" is the name of a existant model class, I would like to then get a reference to the Pigeon class so defined.
Also, I don't want to use eval at all, since the dynamic string "Pigeon" in this case, comes from outside.
You could try, although probably very, very bad practice:
def get_class_instance(nm) :
try :
return eval(nm+'()')
except :
return None
Also, to make that safer, you could give eval a locals hash: eval(nm+'()', {'Pigeon':pigeon})
I'm not sure if that would work, and it definitely has an issue: if there is a function called the value of nm, it would return that:
def Pigeon() :
return "Pigeon"
print(get_class_instance('Pigeon')) # >> 'Pigeon'
EDIT: Another way of doing it is possibly (untested), if you know the module:
(Sorry, I keep forgetting it's not obj.hasattr, its hasattr(obj)!)
import models as m
def get_class_instance(nm) :
if hasattr(m, nm) :
return getattr(m, nm)()
else : return None
EDIT 2: Yes, it does work! Woo!
Actually, looking through the source code and interweb, I found a undocumented method that seems to fit the bill.
from google.appengine.ext import db
key = "ModelObject" #This is a dynamically generated string
klass = db.class_for_kind(key)
This method will throw a descriptive exception if the class does not exist, so you should probably catch it if the key string comes from the outside.
There's two fairly easy ways to do this without relying on internal details:
Use the google.appengine.api.datastore API, like so:
from google.appengine.api import datastore
q = datastore.Query('EntityType')
if q.get(1):
print "EntityType exists!"
The other option is to use the db.Expando class:
def GetEntityClass(entity_type):
class Entity(db.Expando):
#classmethod
def kind(cls):
return entity_type
return Entity
cls = GetEntityClass('EntityType')
if cls.all().get():
print "EntityType exists!"
The latter has the advantage that you can use GetEntityClass to generate an Expando class for any entity type, and interact with it the same way you would a normal class.

Resources