Abstract Models and Foreign Keys in Django

Abstract Models and Foreign Keys in Django - django-models

I am working on a django project in which I create a set of three abstract models that I will use for a variety of apps later on. The problem I am running into is that I want to connect those models via ForeignKey but django tells me that it can't assign foreignkeys to an abstract model.
My current solution is to assign foreignkeys when I instanciate the class in my other apps. However, I am writing a Manager for the abstract classes (book and pages) right now and would need to access these foreignkeys. What I am basically trying to do is to get the number of words a book has in a stateless manner, hence without storing it in a field of the page or book.
The model looks similar to this:
class Book(models.Models):
name = models.CharField(...)
author = models.CharField(...)
...
class Meta:
abstract = True
class Page(models.Models):
book = models.ForeignKey(Book)
chapter = models.CharField(...)
...
class Meta:
abstract = True
class Word(models.Models):
page = models.ForeignKey(Page)
line = models.IntegerField(...)
...
class Meta:
abstract = True
Note that this model here is just to give an example of what I am trying to do, hence whether this model (Book-Page-Word) makes sense from an implementation standpoint is not needed.

Maybe what you need here is a GenericForeignKey, since you don't actually know what model your ForeignKeys will point to? That means that you'll loose some of the "type-safety" guarantees of a normal relation, but it will allow you to specify those relationships in a more general way. See https://docs.djangoproject.com/en/dev/ref/contrib/contenttypes/#django.contrib.contenttypes.generic.GenericForeignKey
Django model inheritance is a cool thing, and nice as a shortcut for making your models DRYer, but doesn't always play nicely with the polymorphic ideas we generally have of classes.

How about this approach? I'm considering using it myself to delay the relationship definition until I inherit.
# This is a very very contrived (but simple) example.
def AbstractBook(AuthorModel):
class AbstractBookClass(Model):
name = CharField(max_length=10)
author = ForeignKey(AuthorModel)
class Meta:
abstract = True
return AbstractBookClass
class AbstractAuthor(Model):
name = CharField(max_length=10)
class Meta:
abstract = True
class BadAuthor(AbstractAuthor):
pass
class BadBook(AbstractBook(BadAuthor)):
pass
class GoodAuthor(AbstractAuthor):
pass
class GoodBook(AbstractBook(GoodAuthor)):
pass

Two things:
1) The way you constructed your schema, you will need a GenericForeignKey, as already mentioned. But you must take into account that Book through Page have a many-to-many relationship with Word, while a GenericForeignKey just realizes a one-to-many. Django has nothing out-of-the-box yet for the normalized schema. What you will have to do (if you care about normalization) is to implement the intermediate (with "through" for concrete Models) yourself.
2) If you care about language processing, using a relational database (with or without Django's ORM) is not a very efficient approach, considering the resulting database size and query time after a few dozen books. Add to that the extra columns you will need to look up for your joins because of the abstract Models, and it will soon become very impractical. I think that it would be more beneficial to look into other approaches, for example storing only the aggregates and/or denormalizing (even looking into non-relational storage systems in this case), based on your queries and views.

Related

Does extensive use of ndb models affect performance?

I'm new to GAE and I'm still trying to figure things out. We're developing an Android app which uses Cloud Datastore to store images, videos, text, audios, etc. So we have now over 15 types of content objects.
I've been modelling each type of object as a distinct ndb Model class, but I'm wondering if this kind of design could affect performance.
Specifically, wouldn't it be better to write a simple class (e.g ContentObject) which simply had a content_type, and a few generic fields as string, number and blob?
I guess I'd go for the latter if I had to worry about creating/maintaining tables (or simply knowing that there are regular db tables behind).
I really like the first option, but I had to ask, just in case.

There are no performance differences to worry about between the 2 approaches.
With dedicated models you'll have to write a bit more code - each model needs to be handled separately. But it's simpler code, especially if eventually you will have some properties which only exist for some entities or are handled differently, which would require conditional logic with a generic model.
Building queries is also simpler with dedicated models if there are property differences, using a single model may require filling in unused properties (maybe by using default values) if they are used for sorting/filtering query results (entities with missing properties aren't indexed by the respective properties so they won't show up in the results).
On the other hand you'll need separate queries for each model, you can't obtain results for different kinds in the same query. And you'll need to maintain separate composite indexes for each kind (with a total limit of 200 such indexes per application).
If you're worrying about code duplication, which could also be a reason for which you'd consider a shared model, it's also possible to combine the common properties in a single ndb model class, with a single/common implementation for handling those common properties, and inherit that class in dedicated subclasses handling the differences. Something like this:
class Content(ndb.Model):
type = ndb.StringProperty() # not really needed, cls._get_kind() can be used instead
blob = ndb.StringProperty()
# other generic/common content properties and related methods
class Video(Content):
has_cc = ndb.BooleanProperty()
# other video-specific content properties and related methods
But this is just an implementation approach, from the datastore perspective you're still using dedicated models - in the above example a video entity will have a Video kind, not a Content kind.
There are no tables with the datastore, the only thing shared between entities of the same kind is their ndb model (which is specific just for the more performant ndb client library, other client libraries don't have one) and the search indexes definitions.

NDB Jinja2 best way to access KeyProperty

i've this model
class Team(ndb.Model):
name = ndb.StringProperty()
password = ndb.StringProperty()
email = ndb.StringProperty()
class Offer(ndb.Model):
team = ndb.KeyProperty(kind=Team)
cut = ndb.StringProperty()
price = ndb.IntegerProperty()
class Call(ndb.Model):
name = ndb.StringProperty()
called_by = ndb.KeyProperty(kind=Team)
offers = ndb.KeyProperty(kind=Offer, repeated=True)
status = ndb.StringProperty(choices=['OPEN', 'CLOSED'], default="OPEN")
dt = ndb.DateTimeProperty(auto_now_add=True)
i've this view
class MainHandler(webapp2.RequestHandler):
def get(self):
calls_open = Call.query(Call.status == "OPEN").fetch()
calls_past = Call.query(Call.status == "CLOSED").fetch()
template_values = dict(open=calls_open, past=calls_past)
template = JINJA_ENVIRONMENT.get_template('templates/index.html')
self.response.write(template.render(template_values))
and this small test tempalte
{% for call in open %}
<b>{{call.name}} {{call.called_by.get().name}}</b>
{% endfor %}
now, with the get() it works perfectly.
my question is: is this correct?
is there a better way to do it?
personally i found it strange to get() the values in the template and i would prefer to fetch it inside the view.
my idea was to:
create a new list res_open_calls=[]
for all the call in calls_open call the to_dict() dict_call = call.to_dict()
then assign to the dict_call dict_call['team'] = call.team.get().to_dict()
add the object to the list res_open_calls.append(dict_call)
then return this just generated list.
this is the gist i wrote ( for a modified code) https://gist.github.com/esseti/0dc0f774e1155ac63797#file-call_offers_calls
it seems more clean but a bit more expensive (a second list has to be generated). is there something better/clever to do?

The OP is clearly showing code very different from the one they're using: they show called_by as a StringProperty so calling get on it should crash, they talk about a call.team that doesn't exist in the code they show... anyway, I'm trying to guess what they actually have, because I find the underlying idea is important.
The OP, IMHO, is correct to be uncomfortable about having DB operations right in a Jinjia2 template, which would be best limited to presentation-level issues. I'll assume (guess!) that part of the Call model is:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
and the relevant part of the Jinja2, currently working for the OP, is:
{{{{call.team.get().name}}
A better structure might then be:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
#property
def team_name(self):
return self.team.get().name
and in the template just {{call.teamname}}.
This still performs the DB operation during template expansion, but it does so on the Python code side of things, rather than the Jinja2 side of things -- better than embodying so much detail about the model's data architecture in a template that should focus on presentation only.
Alternatively, if a Call instance is .put rarely and displayed often, and its team does not change name, one could, so to speak, cache the value in a ComputedProperty:
class Call(ndb.Model):
team = ndb.KeyProperty(kind=Team)
def _team_name(self):
return self.team.get().name
team_name = ComputedProperty(self._team_name)
However, this latter choice is inferior (as it involves more storage space, does not save execution time, and complicates actual interactions with the datastore) unless some queries for Call entities also need to query on team_name (in which latter case it would be a must).
If one did chose this alternative, the Jinjia2 template would still use {{call.teamname}}: this hints at why it's best to use in templates only logic strictly connected to presentation -- it leaves more degrees of freedom for implementing attributes and properties on the Python code side of things, without needing to change the templates. "Separation of concerns" is an excellent principle in programming.
The snippet posted elsewhere suggests a higher degree of complication, where Call is indeed as shown but then of course there is no call.team as shown repeatedly in the question -- rather, a double indirection via call.offers and each offer.team. This makes sense in terms of entity-relationship modeling but can be heavy-going to implement in the essentially "normalized" terms the snippet suggests in any NoSQL database, including GAE's datastore.
If teams don't change names, and calls don't change their list of offers, it might show better performance to denormalize the model (storing in Call the technically redundant information that, in the snippet, is fetched by running through the double indirection) -- e.g by structured properties, https://cloud.google.com/appengine/docs/python/ndb/properties#structured , to embed copies of the Offer objects in Call entities, and a copy of the Team object (or even just the team's name) in the Offer entity.
Like all de-normalizing, this can take a few extra bytes per entity in the datastore, but nevertheless could amply pay for it by minimizing the number of datastore accesses needed at fetch time, depending on the pattern of accesses to the various entities and properties.
However, by now we're straying far away from the question, which is about what to put in the template, what on the Python side. Optimizing datastore patterns is a separate issue well worth of Qs of its own.
Summarizing my stance on the latter, core issue of Python code vs template as residence for logic: data-access logic should be on the Python code side, ideally embedded in Model classes (using property for just-in-time access, possibly all the way to denormalization at entity-building or perhaps at entity-finalization time); Jinjia2 templates (or any other kind of pure presentation layer) should only have logic directly needed for presentation, not for data access (nor business logic either of course).

Field Name Best Practices (Shadowing or Compund Names)

As the red block above (warning that this is a subjective question and may be closed) there may not be a stone etched law on this, but I don't see why that would warrant closing a question.
...Rant aside
I am planning on implementing Hibernate as my persistence framework, which may fix my problem upon implementation, but I have DB tables that translate into class and sub-class (many specifics and complications that exist in real life are omitted :) ):
//dbo.a with column Name
class a {
public String Name;
}
//dbo.b with column Name and a foreign key to dbo.a
class b extends a {
public String Name;
}
So, for the what should be done and why:
Shadowing:
I could leave these as is, which would require some reflection cleverness (per http://forums.sun.com/thread.jspa?threadID=5419973 ), when working with objects whose types are unknown at compile.
Compound Names:
I could name all of my fields preceded by its class's name i.e. a.aName and b.bName, which gets really ugly in real life: Door.DoorName and RotatingDoor.RotatingDoorName
Getters and Setters:
I didn't mention this one, since with JavaBeans these will be derived from the field names, and I believe Hibernate uses annotated POJOs.
To influence the results a little, shadowing seems to be the most robust, at least in my case where class a extends an abstract class with Name defined, then b shadows with its own Name when applicable. Using compound names would mean that if I wanted to add a NickName column to all my DB tables then I would have to add that field to each type (and then what's the point of inheritance?!)
In the end I decided to find out what people, hopefully who have experienced pros/cons of an implementation of one or more of these technique, have to say on the issue; or that convenient stone etched best practice will do :)
-Nomad311

you should only define your member in the base class if you need it in all subclasses. hibernate offers various types of mappings for class trees. take a look at Inheritance mapping in the manual to get a feeling of it.
you can define your mapping either via an xml file or via annotations.

Django models generic modelling

Say, there is a Page that has many blocks associated with it. And each block needs custom rendering, saving and data.
Simplest it is, from the code point of view, to define different classes (hence, models) for each of these models. Simplified as follows:
class Page(models.Model):
name = models.CharField(max_length=64)
class Block(models.Model):
page = models.ForeignKey(Page)
class Meta():
abstract = True
class BlockType1(Block):
other_data = models.CharField(max_length=32)
def render(self):
"""Some "stuff" here """
pass
class BlockType2(Block):
other_data2 = models.CharField(max_length=32)
def render(self):
"""Some "other stuff" here """
pass
But then,
Even with this code, I can't do a query like page.block_set.all() to obtain all the different blocks, irrespective of the block type.
The reason for the above is that, each model defines a different table; Working around to accomplish it using a linking model and generic foreign keys, can solve the problem, but it still leaves multiple database tables queries per page.
What would be the right way to model it? Can the generic foreign keys (or something else) be used in some way, to store the data preferably in the same database table, yet achieve inheritance paradigms.
Update:
My point was, How can I still get the OOP paradigms to work. Using a same method with so many ifs is not what I wanted to do.
The best solution, seems to me, is to create separate standard python class (Preferably in a different blocks.py), that defines a save which saves the data and its "type" by instantiating the same model. Then create a template tag and a filter that calls the render, save, and other methods based on the model's type.

Don't model the page in the database. Pages are a presentation thing.
First -- and foremost -- get the data right.
"And each block needs custom rendering, saving and data." Break this down: you have unique data. Ignore the "block" and "rendering" from a model perspective. Just define the data without regard to presentation.
Seriously. Just define the data in the model without any consideration of presentation or rending or anything else. Get the data model right.
If you confuse the model and the presentation, you'll never get anything to work well. And if you do get it to work, you'll never be able to extend or reuse it.
Second -- only after the data model is right -- you can turn to presentation.
Your "blocks" may be done simply with HTML <div> tags and a style sheet. Try that first.
After all, the model works and is very simple. This is just HTML and CSS, separate from the model.
Your "blocks" may require custom template tags to create more complex, conditional HTML. Try that second.
Your "blocks" may -- in an extreme case -- be so complex that you have to write a specialized view function to transform several objects into HTML. This is very, very rare. You should not do this until you are sure that you can't do this with template tags.
Edit.
"query different external data sources"
"separate simple classes (not Models) that have a save method, that write to the same database table."
You have three completely different, unrelated, separate things.
Model. The persistent model. With the save() method. These do very, very little.
They have attributes and a few methods. No "query different external data sources". No "rendering in HTML".
External Data Sources. These are ordinary Python classes that acquire data.
These objects (1) get external data and (2) create Model objects. And nothing else. No "persistence". No "rendering in HTML".
Presentation. These are ordinary Django templates that present the Model objects. No external query. No persistence.

I just finished a prototype of system that has this problem in spades: a base Product class and about 200 detail classes that vary wildly. There are many situations where we are doing general queries against Product, but then want to to deal with the subclass-specific details during rendering. E.g. get all Products from Vendor X, but display with slightly different templates for each group from a specific subclass.
I added hidden fields for a GenericForeignKey to the base class and it auto-fills the content_type & object_id of the child class at save() time. When we have a generic Product object we can say obj = prod.detail and then work directly with the subclass object. Took about 20 lines of code and it works great.
The one gotcha we ran into during testing was that manage.py dumpdata followed by manage.py loaddata kept throwing Integrity Errors. Turns out this is a well-known problem and a fix is expected in the 1.2 release. We work around it by using mysql commands to dump/reload the test dataset.

Model-View-ViewModel pattern violation of DRY?

I read this article today http://dotnetslackers.com/articles/silverlight/Silverlight-3-and-the-Data-Form-Control-part-I.aspx about the use of the MVVM pattern within a silverlight app where you have your domain entities and view spesific entities which basically is a subset of the real entity objects. Isn't this a clear violation of the DRY principle? and if so how can you deal with it in a nice way?

Personally, I don't like what Dino's doing there and I wouldn't approach the problem the same way. I usually think of a VM as a filtered, grouped and sorted collections of Model classes. A VM to me is a direct mapping to the View, so I might create a NewOrderViewModel class that has multiple CollectionViews used by the View (maybe one CV for Customers and another CV for Products, probably both filtered). Creating an entirely new VM class for every class in the Model does violate DRY in my opinion. I would rather use derivation or partial classes to augment the Model where necessary, adding in View specific (often calculated) properties. IMO .NET RIA Services is an excellent implementation of combining M and VM data with the added bonus that it's usable in on both the client and the server. Dino's a brilliant guy, but way to call him out on this one.

DRY is a principle, not a hard rule. You are a human and can differentiate.
E.g. If DRY really was a hard rule you would never assign the same value to two different variables. I guess in any non trivial program you would have more than one variable containing the value 0.
Generally speaking: DRY does usually not apply to data. Those view specific entities would probably only be data transfer objects without any noteworthy logic. Data may be duplicated for all kinds of reasons.

I think the answer really depends on what you feel should be in the ViewModel. For me the ViewModel represents the model of the screen currently being displayed.
So for something like a ViewCategoryViewModel, I don't have a duplication of the fields in Category. I expose a Category object as a property on the ViewModel (under say "SelectedCategory"), any other data the view needs to display and the Commands that screen can take.
There will always be some similarity between the domain model and the view model, but it all comes down to how you choose to create the ViewModel.

It's the same as with Data Transfer Objects (DTO).
The domain for those two object types is different, so it's not a violation of DRY.
Consider the following example:
class Customer
{
public int Age
}
And a corsponding view model:
class CustomerViewModel
{
public string Age;
// WPF validation code is going to be a bit more complicated:
public bool IsValid()
{
return string.IsNullOrEmpty(Age) == false;
}
}
Differnt domains - differnet property types - different objects.