i've this model
class Team(ndb.Model):
name = ndb.StringProperty()
password = ndb.StringProperty()
email = ndb.StringProperty()
class Offer(ndb.Model):
team = ndb.KeyProperty(kind=Team)
cut = ndb.StringProperty()
price = ndb.IntegerProperty()
class Call(ndb.Model):
name = ndb.StringProperty()
called_by = ndb.KeyProperty(kind=Team)
offers = ndb.KeyProperty(kind=Offer, repeated=True)
status = ndb.StringProperty(choices=['OPEN', 'CLOSED'], default="OPEN")
dt = ndb.DateTimeProperty(auto_now_add=True)
i've this view
class MainHandler(webapp2.RequestHandler):
def get(self):
calls_open = Call.query(Call.status == "OPEN").fetch()
calls_past = Call.query(Call.status == "CLOSED").fetch()
template_values = dict(open=calls_open, past=calls_past)
template = JINJA_ENVIRONMENT.get_template('templates/index.html')
self.response.write(template.render(template_values))
and this small test tempalte
{% for call in open %}
<b>{{call.name}} {{call.called_by.get().name}}</b>
{% endfor %}
now, with the get() it works perfectly.
my question is: is this correct?
is there a better way to do it?
personally i found it strange to get() the values in the template and i would prefer to fetch it inside the view.
my idea was to:
create a new list res_open_calls=[]
for all the call in calls_open call the to_dict() dict_call = call.to_dict()
then assign to the dict_call dict_call['team'] = call.team.get().to_dict()
add the object to the list res_open_calls.append(dict_call)
then return this just generated list.
this is the gist i wrote ( for a modified code) https://gist.github.com/esseti/0dc0f774e1155ac63797#file-call_offers_calls
it seems more clean but a bit more expensive (a second list has to be generated). is there something better/clever to do?
The OP is clearly showing code very different from the one they're using: they show called_by as a StringProperty so calling get on it should crash, they talk about a call.team that doesn't exist in the code they show... anyway, I'm trying to guess what they actually have, because I find the underlying idea is important.
The OP, IMHO, is correct to be uncomfortable about having DB operations right in a Jinjia2 template, which would be best limited to presentation-level issues. I'll assume (guess!) that part of the Call model is:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
and the relevant part of the Jinja2, currently working for the OP, is:
{{{{call.team.get().name}}
A better structure might then be:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
#property
def team_name(self):
return self.team.get().name
and in the template just {{call.teamname}}.
This still performs the DB operation during template expansion, but it does so on the Python code side of things, rather than the Jinja2 side of things -- better than embodying so much detail about the model's data architecture in a template that should focus on presentation only.
Alternatively, if a Call instance is .put rarely and displayed often, and its team does not change name, one could, so to speak, cache the value in a ComputedProperty:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
def _team_name(self):
return self.team.get().name
team_name = ComputedProperty(self._team_name)
However, this latter choice is inferior (as it involves more storage space, does not save execution time, and complicates actual interactions with the datastore) unless some queries for Call entities also need to query on team_name (in which latter case it would be a must).
If one did chose this alternative, the Jinjia2 template would still use {{call.teamname}}: this hints at why it's best to use in templates only logic strictly connected to presentation -- it leaves more degrees of freedom for implementing attributes and properties on the Python code side of things, without needing to change the templates. "Separation of concerns" is an excellent principle in programming.
The snippet posted elsewhere suggests a higher degree of complication, where Call is indeed as shown but then of course there is no call.team as shown repeatedly in the question -- rather, a double indirection via call.offers and each offer.team. This makes sense in terms of entity-relationship modeling but can be heavy-going to implement in the essentially "normalized" terms the snippet suggests in any NoSQL database, including GAE's datastore.
If teams don't change names, and calls don't change their list of offers, it might show better performance to denormalize the model (storing in Call the technically redundant information that, in the snippet, is fetched by running through the double indirection) -- e.g by structured properties, https://cloud.google.com/appengine/docs/python/ndb/properties#structured , to embed copies of the Offer objects in Call entities, and a copy of the Team object (or even just the team's name) in the Offer entity.
Like all de-normalizing, this can take a few extra bytes per entity in the datastore, but nevertheless could amply pay for it by minimizing the number of datastore accesses needed at fetch time, depending on the pattern of accesses to the various entities and properties.
However, by now we're straying far away from the question, which is about what to put in the template, what on the Python side. Optimizing datastore patterns is a separate issue well worth of Qs of its own.
Summarizing my stance on the latter, core issue of Python code vs template as residence for logic: data-access logic should be on the Python code side, ideally embedded in Model classes (using property for just-in-time access, possibly all the way to denormalization at entity-building or perhaps at entity-finalization time); Jinjia2 templates (or any other kind of pure presentation layer) should only have logic directly needed for presentation, not for data access (nor business logic either of course).
Related
I am writing a datastore migration for our current production App Engine application.
We made some fairly extensive changes to the data model so I am trying to put in place an architecture to allow easier migrations in the future. This includes test suites for the migrations and common class structures for migration scripts.
I am running into a problem with my current strategy. For both the migrations and the test scripts I need a way to load the Model classes from the old schema and the Model classes for the new data schema into memory at the same time and load entities using either.
Here is an example set of schemas.
rev1.py
class Account(db.Model):
_version = db.IntegerProperty(default = 1)
user = db.UserProperty(auto_current_user_add = True, required = True)
name = db.StringProperty()
contact_email = db.EmailProperty()
rev2.py
class Account(db.Model):
_version = db.IntegerProperty(default = 2)
auth_id = db.StringProperty()
name = db.StringProperty()
pwd_hash = db.StringProperty(required = True, indexed = False)
A migration script may look something like:
import rev1
import rev2
class MyMigration(...):
def isNeeded(self):
num_accounts = num_entities_with_version(rev1.Account, 1)
return num_accounts > 0
def run(self):
rev1_accounts = rev1.Account.all()
for account in [a for a in rev1_accounts if account._version == 1]:
auth_id = account.contact_email
if auth_id is None or auth_id == '':
auth_id = account.user.email()
new_account = rev2.Account.create(auth_id = auth_id,
name = account.name)
And a test suite would look something like this:
import rev1
import rev2
class MyTest(...):
def testIt(self):
# Setup data
act1 = rev1.Account(name = '..', contact_email = '..')
act1.put()
act2 = rev1.Account(name = '..', contact_email = '..')
act2.put()
# Run migration
migration.run()
# Check results
accounts = rev2.Account.all().fetch(99)
So as you can see I am using the old revision in two ways. I am using it in the migration as a way to read data in the old format and convert it into the new format. (note: I can't read it in the new format because of things like the required pwd_hash field and other field changes). I am using it in the test suite to setup test data in the old format before running the migration.
It all seems great in theory, but in practice it falls apart because GAE doesn't allow multiple models to be loaded for the same kind, or more specifically, queries only return for the most recently defined model.
In the development server this seems to be due to the fact that the process of calling get() on a query for an entity (ex: Account.get(my_key)) calls a result hook that builds the result Model object by calling class_for_kind on the entity kind name from the data. So even though I may call rev2.Account.get(), it may build up rev1.Account Model objects because the kind 'Account' maps to rev1.Account in the _kind_map dictionary.
This has made me rethink my migration strategy a bit and I wanted to ask if anyone has thoughts. Specifically:
Would it be safe to manually override google.appengine.ext.db._kind_map at runtime in test and on the production servers to allow this migration method to work?
Is there some better way to keep two versions of a Model in memory at the same time?
Is there a different migration method that may be a smarter way to go about this work?
Other methods I have thought of trying include:
Change the entity kind when the version changes. (use kind() to change it) Then when we migrate we move all classes to the new kind name.
Find a way to query the entities and get back a 'raw' object (proto buffers??) that has not been built into a full object. (would not work with tests)
'Just Do It Live': Don't write tests for any of this and just try to migrate using the latest schema loading the older data working around issues as the come up.
I think there are actually several questions within the greater question. There seem to be two key questions here though, one is how to test and the other is how to really do it.
I wouldn't define the kind multiple times; as you've noted there are nuances to doing this, and, if you wind up with the wrong model loaded, you'll get all sorts of headaches. That said, it is completely possible for you to manipulate the kind_map. I've done this in some special cases, but I try to avoid it when possible.
For a live migration where you've got significant schema changes, you've got two choices: use Expando or use the lower level API. When adding required fields, you might find it easier to use Expando, then run a migration to add the new information, then switch back to a plain db.Model. The lower-level API sits right under the ext.db stuff, and it presents the entity as a Python dict. This can be very convenient for manipulating an entity. Use whichever method you're more comfortable with. I prefer Expando when posible, since it is a higher level interface, but it is a two-step process.
For testing, I'd personally suggest you focus on the actual conversion routines. So instead of testing the method from the point of querying down, test to ensure your conversion routines themselves function correctly. You might even choose to pass in the old entity as a Python dict, then return the new entity.
I'd make one other adjustment here as well. I'd rather use a query to find all my rev 1 accounts. That's the great thing about having an indexed _version on your models. You can trivially find things that need migrated.
Also, check out Google's article on updating schemas. It is old, but still good.
Another approach is to simply do the migration on version 2, leaving the old attributes on the model and setting them to None after you update the version. This will clear out the space they use but will still leave them defined. Then in a following release you can just remove them from the model.
This method is pretty simple, but does require two releases to remove old attribute completely, so is more akin to deprecating the existing attributes.
I am working on a django project in which I create a set of three abstract models that I will use for a variety of apps later on. The problem I am running into is that I want to connect those models via ForeignKey but django tells me that it can't assign foreignkeys to an abstract model.
My current solution is to assign foreignkeys when I instanciate the class in my other apps. However, I am writing a Manager for the abstract classes (book and pages) right now and would need to access these foreignkeys. What I am basically trying to do is to get the number of words a book has in a stateless manner, hence without storing it in a field of the page or book.
The model looks similar to this:
class Book(models.Models):
name = models.CharField(...)
author = models.CharField(...)
...
class Meta:
abstract = True
class Page(models.Models):
book = models.ForeignKey(Book)
chapter = models.CharField(...)
...
class Meta:
abstract = True
class Word(models.Models):
page = models.ForeignKey(Page)
line = models.IntegerField(...)
...
class Meta:
abstract = True
Note that this model here is just to give an example of what I am trying to do, hence whether this model (Book-Page-Word) makes sense from an implementation standpoint is not needed.
Maybe what you need here is a GenericForeignKey, since you don't actually know what model your ForeignKeys will point to? That means that you'll loose some of the "type-safety" guarantees of a normal relation, but it will allow you to specify those relationships in a more general way. See https://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#django.contrib.contenttypes.generic.GenericForeignKey
Django model inheritance is a cool thing, and nice as a shortcut for making your models DRYer, but doesn't always play nicely with the polymorphic ideas we generally have of classes.
How about this approach? I'm considering using it myself to delay the relationship definition until I inherit.
# This is a very very contrived (but simple) example.
def AbstractBook(AuthorModel):
class AbstractBookClass(Model):
name = CharField(max_length=10)
author = ForeignKey(AuthorModel)
class Meta:
abstract = True
return AbstractBookClass
class AbstractAuthor(Model):
name = CharField(max_length=10)
class Meta:
abstract = True
class BadAuthor(AbstractAuthor):
pass
class BadBook(AbstractBook(BadAuthor)):
pass
class GoodAuthor(AbstractAuthor):
pass
class GoodBook(AbstractBook(GoodAuthor)):
pass
Two things:
1) The way you constructed your schema, you will need a GenericForeignKey, as already mentioned. But you must take into account that Book through Page have a many-to-many relationship with Word, while a GenericForeignKey just realizes a one-to-many. Django has nothing out-of-the-box yet for the normalized schema. What you will have to do (if you care about normalization) is to implement the intermediate (with "through" for concrete Models) yourself.
2) If you care about language processing, using a relational database (with or without Django's ORM) is not a very efficient approach, considering the resulting database size and query time after a few dozen books. Add to that the extra columns you will need to look up for your joins because of the abstract Models, and it will soon become very impractical. I think that it would be more beneficial to look into other approaches, for example storing only the aggregates and/or denormalizing (even looking into non-relational storage systems in this case), based on your queries and views.
"Make things as simple as possible, but no simpler."
Can we find the solution/s that fix the Python database world?
Update: A 'lustdb' prototype has been written by Alex Martelli - if you know any somewhat lightweight, high-level database libraries with multiple backends we could wrap in syntax sugar honey, please weigh in!
from someAmazingDB import *
#we imported a smart model class and db object which talk to database adapter/s
class Task (model):
title = ''
done = False #native types not a custom object we have to think about!
db.taskList = []
#or
db.taskList = expandableTypeCollection(Task) #not sure what this syntax would be
db['taskList'].append(Task(title='Beat old sql interfaces',done=False))
db.taskList.append(Task('Illustrate different syntax modes',True)) # ok maybe we should just use kwargs
#at this point it should be autosaved to a default db option
#by default we should be able to reload the console and access the default db:
>> from someAmazingDB import *
>> print 'Done tasks:'
>> for task in db.taskList:
>> if task.done:
>> print task.title
'Illustrate different syntax modes'
I'm a fan of Python, webPy and Cherry Py, and KISS in general.
We're talking automatic Python to SQL type translation or NoSQL.
We don't have to totally be SQL compatible! Just a scalable subset or ignore it!
Re:model changes, it's ok to ask the developer when they try to change it or have a set of sensible defaults.
Here is the challenge: The above code should work with very little modification or thinking required. Why must we put up with compromise when we know better?
It's 2010, we should be able to code scalable, simple databases in our sleep.
If you think this is important, please upvote!
What you request cannot be done in Python 2.whatever, for a very specific reason. You want to write:
class Task(model):
title = ''
isDone = False
In Python 2.anything, whatever model may possibly be, this cannot ever allow you to predict any "ordering" for the two fields, because the semantics of a class statement are:
execute the body, thus preparing a dict
locate the metaclass and run special methods thereof
Whatever the metaclass may be, step 1 has destroyed any predictability of the fields' order.
Therefore, your desired use of positional parameters, in the snippet:
Task('Illustrate different syntax modes', True)
cannot associate the arguments' values with the model's various fields. (Trying to guess by type association -- hoping no two fields ever have the same type -- would be even more horribly unpythonic than your expressed desire to use db.tasklist and db['tasklist'] indifferently and interchangeably).
One of the backwards-incompatible changes in Python 3 was introduced specifically to deal with situations of this ilk. In Python 3, a custom metaclass can define a __prepare__ function which runs before "step 1" in the above simplified list, and this lets it have more control about the class's body. Specifically, quoting PEP 3115...:
__prepare__ returns a dictionary-like object which is used to store
the class member definitions during evaluation of the class body.
In other words, the class body is evaluated as a function block
(just like it is now), except that the local variables dictionary
is replaced by the dictionary returned from __prepare__. This
dictionary object can be a regular dictionary or a custom mapping
type.
...
An example would be a metaclass that
uses information about the
ordering of member declarations to create a C struct. The metaclass
would provide a custom dictionary that simply keeps a record of the
order of insertions.
You don't want to "create a C struct" as in this example, but the order of fields is crucial (to allow the use of positional parameters that you want) and so the custom metaclass (obtained through base model) would have a __prepare__ classmethod returning an ordered dictionary. This removes the specific issue, but, of course, only if you're willing to switch all of your code using this "magic ORM" to Python 3. Would you be?
Once that's settled, the issue is, what database operations do you want to perform, and how. Your example, of course, does not clarify this at all. Is the taskList attribute name special, or should any other attribute assigned to the db object be "autosaved" (by name and, what other characteristic[s]?) and "autoretrieved" upon use? Are there to be ways to remove entities, alter them, locate them (otherwise than by having once been listed in the same attribute of the db object)? How does your sample code know what DB service to use and how to authenticate to it (e.g. by userid and password) if it requires authentication?
The specific tasks you list would not be hard to implement (e.g. on top of Google App Engine's storage service, which does not require authentication nor specification of "what DB service to use"). model's metaclass would introspect the class's fields and generate a GAE Model for the class, the db object would use __setattr__ to set an atexit trigger for storing the final value of an attribute (as an entity in a different kind of Model of course), and __getattr__ to fetch that attribute's info back from storage. Of course without some extra database functionality this all would be pretty useless;-).
Edit: so I did a little prototype (Python 2.6, and based on sqlite) and put it up on http://www.aleax.it/lustdb.zip -- it's a 3K zipfile including 225-lines lustdb.py (too long to post here) and two small test files roughly equivalent to the OP's originals: test0.py is...:
from lustdb import *
class Task(Model):
title = ''
done = False
db.taskList = []
db.taskList.append(Task(title='Beat old sql interfaces', done=False))
db.taskList.append(Task(title='Illustrate different syntax modes', done=True))
and test1.p1 is...:
from lustdb import *
print 'Done tasks:'
for task in db.taskList:
if task.done:
print task
Running test0.py (on a machine with a writable /tmp directory -- i.e., any Unix-y OS, or, on Windows, one on which a mkdir \tmp has been run at any previous time;-) has no output; after that, running test1.py outputs:
Done tasks:
Task(done=True, title=u'Illustrate different syntax modes')
Note that these are vastly less "crazily magical" than the OP's examples, in many ways, such as...:
1. no (expletive delete) redundancy whereby `db.taskList` is a synonym of `db['taskList']`, only the sensible former syntax (attribute-access) is supported
2. no mysterious (and totally crazy) way whereby a `done` attribute magically becomes `isDone` instead midway through the code
3. no mysterious (and utterly batty) way whereby a `print task` arbitrarily (or magically?) picks and prints just one of the attributes of the task
4. no weird gyrations and incantations to allow positional-attributes in lieu of named ones (this one the OP agreed to)
The prototype of course (as prototypes will;-) leaves a lot to be desired in many respects (clarity, documentation, unit tests, optimization, error checking and diagnosis, portability among different back-ends, and especially DB features beyond those implied in the question). The missing DB features are legion (for example, the OP's original examples give no way to identify a "primary key" for a model, or any other kinds of uniqueness constraints, so duplicates can abound; and it only gets worse from there;-). Nevertheless, for 225 lines (190 net of empty lines, comments and docstrings;-), it's not too bad in my biased opinion.
The proper way to continue playing with this project would of course be to initiate a new lustdb open source project on the hosting part of code.google.com (or any other good open source hosting site with issue tracker, wiki, code reviews support, online browsing, DVCS support, etc, etc) - I'd do it myself but I'm close to the limit in terms of number of open source projects I can initiate on code.google.com and don't want to "burn" the last one or two in this way;-).
BTW, the lustdb name for the module is a play of word with the OP's initials (first two letters each of first and last names), in the tradition of awk and friends -- I think it sounds nicely (and most other obvious names such as simpledb and dumbdb are taken;-).
I think you should try ZODB. It is object oriented database designed for storing python objects. Its API is quite close to example you have provided in your question, just take a look at tutorial.
What about using Elixir?
Forget ORM! I like vanilla SQL. The python wrappers like psycopg2 for postgreSQL do automatic type conversion, offer pretty good protection against SQL injection, and are nice and simple.
sql = "SELECT * FROM table WHERE id=%s"
data = (5,)
cursor.execute(sql, data)
The more I think on't the more the Smalltalk model of operation seems more relevant. Indeed the OP may not have reached far enough by using the term "database" to describe a thing which should have no need for naming.
A running Python interpreter has a pile of objects that live in memory. Their inter-relationships can be arbitrarily complex, but namespaces and the "tags" that objects are bound to are very flexible. And as pickle can explicitly serialize arbitrary structures for persistence, it doesn't seem that much of a reach to consider each Python interpreter living in that object space. Why should that object space evaporate with the interpreter's close? Semantically, this could be viewed as an extension of the anydbm tied dictionaries. And since most every thing in Python is dictionary-like, the mechanism is almost already there.
I think this may be the generic model that Alex Martelli was proposing above, it might be nice to say something like:
class Book:
def __init__(self, attributes):
self.attributes = attributes
def __getattr__(....): pass
$ python
>>> import book
>>> my_stuff.library = {'garp':
Book({'author': 'John Irving', 'title': 'The World According to Garp',
'isbn': '0-525-23770-4', 'location': 'kitchen table',
'bookmark': 'page 127'}),
...
}
>>> exit
[sometime next week]
$ python
>>> import my_stuff
>>> print my_stuff.library['garp'].location
'kitchen table'
# or even
>>> for book in my_stuff.library where book.location.contains('kitchen'):
print book.title
I don't know that you'd call the resultant language Python, but it seems like it is not that hard to implement and makes backing store equivalent to active store.
There is a natural tension between the inherent structure imposed - and sometimes desired - by RDBMs and the rather free-form navel-gazing put here, but NoSQLy databases are already approaching the content addressable memory model and probably better approximates how our minds keep track of things. Contrariwise, you wouldn't want to keep all the corporate purchase orders such a storage system, but perhaps you might.
How about you give an example of how "simple" you want your "dealing with database" to be, and I then tell you all the stuff that is needed for that "simplicity" to get working ?
(And of which it will still be YOU that will be required to give the information/config to the database interface engine, somewhere, somehow.)
To name but one example :
If your database management engine is some external machine with which you/your app interfaces over IP or some such, there is no way around the fact that the IP identity of where that database engine is running, will have to be provided by your app's database interface client, somewhere, somehow. Regardless of whether that gets explicitly exposed in the code or not.
I've been busy, here it is, released under LGPL:
http://github.com/lukestanley/lustdb
It uses JSON as it's backend at the moment.
This is not the same codebase Alex Martelli did.
I wanted to make the code more readable and reusable with different
backends and such.
Elsewhere I have been working on object oriented HTML elements
accessable in Python in similar ways, AND a library for making web.py
more minimalist.
I'm thinking of ways of using all 3 elements together with automatic
MVC prototype construction or smart mapping.
While old fashioned text based template web programming will be around
for a while still because of legacy systems and because it doesn't
require any particular library or implementation, I feel soon we'll
have a lot more efficent ways of building robust, prototype friendly
web apps.
Please see the mailing list for those interested.
If you like CherryPy, you might like the complementary ORMs I wrote: GeniuSQL (which follows a Table Data gateway model) and Dejavu (which is a complete Data Mapper).
There's far too much in this question and all its subcomments to address completely, but one thing I wanted to point out was that GeniuSQL and Dejavu have a very robust system for mapping native Python types to the types that your particular backend is using. There are very sane defaults, which can be overridden as needed, and even extended if you make a new backend or use types from a backend that isn't yet supported. See http://www.aminus.net/geniusql/chrome/common/doc/trunk/advanced.html#custom for more discussion on that.
I will start a project that needs a web and desktop interface. One solution seems to be IdeaBlade (http://www.ideablade.com).
Can anyone who uses it describe its limitations and advantages? Is it testable?
Thanks,
Alex
As VP of Technology at IdeaBlade it is not for me to comment generally on the DevForce limitations and advantages in this space. Happy to respond to specific questions though.
Is it testable? To this I can respond with the beginnings of an answer.
It's a potentially contentious question. People have strong feelings about what makes something testable. Let me confine myself to specific testing scenarios .. and then you judge the degree to which we meet your testing requirements.
1) DevForce supports pure POCO entities if that's your preference. Most people will prefer to use the entities that derive from our base Entity class so I will confine my subsequent remarks entirely to such entities.
2) You can new-up such an entity using any ctor you please and get and set its (non-navigation) properties with no other setup.
var cust = new Customer {ID=..., Name =...}; // have fun
Assembly references are required of course.
3) To test its navigation properties (properties that return other entities), you first new an EntityManager (our Unit-of-Work, context-like container), add or attach the entities to the EM, and off you go. Navigation properties of the Entities inheriting from our base class expect to find related entities through that container.
4) In most automated tests, the EntityManager will be created in a disconnected state so that it never attempts to reach a server or database.
You might add to it an Order, a Customer, some OrderDetails; note that all of them are constructed within the context of your tests ... not retrieved from anywhere.
Now order.Customer returns the test Customer; order.OrderDetails returns your test details. Your preparation consists of creating the EM, the test entities, ensuring that these entities have unique IDs and are associated.
Here's an example sequence:
var mgr = new EntityManager(false); // create disconnected
var order = new Order {ID = ..., Quantity = 1, ...};
var customer = new Customer {ID = 42, Name = "ABC", };
mgr.AttachEntity(order);
mgr.AttachEntity(customer);
order.Customer = customer; // associate them
The EM is acting as an in-memory database.
5) You can use LINQ now
var custs = mgr.Customers.Where(c => c.Name.StartsWith("A").ToList();
var orders = mgr.Orders.Where(o => o.Customer.Name.StartsWith("A")).ToList();
6) Of course I always create a new EntityManager before each test to eliminate cross-test pollution.
7) I often write a so-called "Data Mother" test helper class to populate an EM with a standard collection of test data, including deviant cases.
8) I can export an EntityManager's cache of test entities to file or a test project resource. When tests run, a DataMother can retrieve and restore these test entities.
Observe that I am moving progressively away from unit testing and toward integration testing. But (so far) my tests do not require access to a server, or Entity Framework, or the database. They run fast and they are less vulnerable to distracting setup failures.
Of course you can get to the server in deep integration tests and you can easily switch servers and databases dynamically for local, LAN, and web scenarios.
9) You can intercept query, save, change, add, remove, and other events for interaction testing.
10) Everything I've described works in both regular .NET and Silverlight and with every test framework I've encountered.
On the downside, I wouldn't describe our product as mock-friendly.
I readily concede that we are not Persistence Ignorant (PI). If you're a PI fanatic, we're the wrong choice for you.
We try to appreciate the important benefits of PI and do our best to realize them in our product. We do what we can to shunt framework concerns out of view. Still, as you see, our abstraction leaks in a few places. For example, we add these members to the public API of your entities:
EntityAspect (the gateway to persistence awareness)
ErrorsChanged
PendingEntityResolved
PropertyChanged
ToQuery<>
Personally, I would have cut this to two (EntityAspect, PropertyChanged); the others snuck by me. For what it's worth, inheriting from Object (as you must) contributes another extraneous five.
We feel that we've made good trade-offs between pure P.I. and ease-of-development.
My question is "does it give you what you need to validate expectations without a lot of friction ... along the entire spectrum from unit to deep integration testing?"
I'm certainly curious to learn how you obtain similar facility with less friction with similar products. And eager to take suggestions on how we can improve our support for application testing.
Feel free to follow-up with questions about other testing scenarios that I may have overlooked.
Hope this helps
Ward
Say, there is a Page that has many blocks associated with it. And each block needs custom rendering, saving and data.
Simplest it is, from the code point of view, to define different classes (hence, models) for each of these models. Simplified as follows:
class Page(models.Model):
name = models.CharField(max_length=64)
class Block(models.Model):
page = models.ForeignKey(Page)
class Meta():
abstract = True
class BlockType1(Block):
other_data = models.CharField(max_length=32)
def render(self):
"""Some "stuff" here """
pass
class BlockType2(Block):
other_data2 = models.CharField(max_length=32)
def render(self):
"""Some "other stuff" here """
pass
But then,
Even with this code, I can't do a query like page.block_set.all() to obtain all the different blocks, irrespective of the block type.
The reason for the above is that, each model defines a different table; Working around to accomplish it using a linking model and generic foreign keys, can solve the problem, but it still leaves multiple database tables queries per page.
What would be the right way to model it? Can the generic foreign keys (or something else) be used in some way, to store the data preferably in the same database table, yet achieve inheritance paradigms.
Update:
My point was, How can I still get the OOP paradigms to work. Using a same method with so many ifs is not what I wanted to do.
The best solution, seems to me, is to create separate standard python class (Preferably in a different blocks.py), that defines a save which saves the data and its "type" by instantiating the same model. Then create a template tag and a filter that calls the render, save, and other methods based on the model's type.
Don't model the page in the database. Pages are a presentation thing.
First -- and foremost -- get the data right.
"And each block needs custom rendering, saving and data." Break this down: you have unique data. Ignore the "block" and "rendering" from a model perspective. Just define the data without regard to presentation.
Seriously. Just define the data in the model without any consideration of presentation or rending or anything else. Get the data model right.
If you confuse the model and the presentation, you'll never get anything to work well. And if you do get it to work, you'll never be able to extend or reuse it.
Second -- only after the data model is right -- you can turn to presentation.
Your "blocks" may be done simply with HTML <div> tags and a style sheet. Try that first.
After all, the model works and is very simple. This is just HTML and CSS, separate from the model.
Your "blocks" may require custom template tags to create more complex, conditional HTML. Try that second.
Your "blocks" may -- in an extreme case -- be so complex that you have to write a specialized view function to transform several objects into HTML. This is very, very rare. You should not do this until you are sure that you can't do this with template tags.
Edit.
"query different external data sources"
"separate simple classes (not Models) that have a save method, that write to the same database table."
You have three completely different, unrelated, separate things.
Model. The persistent model. With the save() method. These do very, very little.
They have attributes and a few methods. No "query different external data sources". No "rendering in HTML".
External Data Sources. These are ordinary Python classes that acquire data.
These objects (1) get external data and (2) create Model objects. And nothing else. No "persistence". No "rendering in HTML".
Presentation. These are ordinary Django templates that present the Model objects. No external query. No persistence.
I just finished a prototype of system that has this problem in spades: a base Product class and about 200 detail classes that vary wildly. There are many situations where we are doing general queries against Product, but then want to to deal with the subclass-specific details during rendering. E.g. get all Products from Vendor X, but display with slightly different templates for each group from a specific subclass.
I added hidden fields for a GenericForeignKey to the base class and it auto-fills the content_type & object_id of the child class at save() time. When we have a generic Product object we can say obj = prod.detail and then work directly with the subclass object. Took about 20 lines of code and it works great.
The one gotcha we ran into during testing was that manage.py dumpdata followed by manage.py loaddata kept throwing Integrity Errors. Turns out this is a well-known problem and a fix is expected in the 1.2 release. We work around it by using mysql commands to dump/reload the test dataset.