Django Models / SQLAlchemy are bloated! Any truly Pythonic DB models out there? - database

"Make things as simple as possible, but no simpler."
Can we find the solution/s that fix the Python database world?
Update: A 'lustdb' prototype has been written by Alex Martelli - if you know any somewhat lightweight, high-level database libraries with multiple backends we could wrap in syntax sugar honey, please weigh in!
from someAmazingDB import *
#we imported a smart model class and db object which talk to database adapter/s
class Task (model):
title = ''
done = False #native types not a custom object we have to think about!
db.taskList = []
#or
db.taskList = expandableTypeCollection(Task) #not sure what this syntax would be
db['taskList'].append(Task(title='Beat old sql interfaces',done=False))
db.taskList.append(Task('Illustrate different syntax modes',True)) # ok maybe we should just use kwargs
#at this point it should be autosaved to a default db option
#by default we should be able to reload the console and access the default db:
>> from someAmazingDB import *
>> print 'Done tasks:'
>> for task in db.taskList:
>> if task.done:
>> print task.title
'Illustrate different syntax modes'
I'm a fan of Python, webPy and Cherry Py, and KISS in general.
We're talking automatic Python to SQL type translation or NoSQL.
We don't have to totally be SQL compatible! Just a scalable subset or ignore it!
Re:model changes, it's ok to ask the developer when they try to change it or have a set of sensible defaults.
Here is the challenge: The above code should work with very little modification or thinking required. Why must we put up with compromise when we know better?
It's 2010, we should be able to code scalable, simple databases in our sleep.
If you think this is important, please upvote!

What you request cannot be done in Python 2.whatever, for a very specific reason. You want to write:
class Task(model):
title = ''
isDone = False
In Python 2.anything, whatever model may possibly be, this cannot ever allow you to predict any "ordering" for the two fields, because the semantics of a class statement are:
execute the body, thus preparing a dict
locate the metaclass and run special methods thereof
Whatever the metaclass may be, step 1 has destroyed any predictability of the fields' order.
Therefore, your desired use of positional parameters, in the snippet:
Task('Illustrate different syntax modes', True)
cannot associate the arguments' values with the model's various fields. (Trying to guess by type association -- hoping no two fields ever have the same type -- would be even more horribly unpythonic than your expressed desire to use db.tasklist and db['tasklist'] indifferently and interchangeably).
One of the backwards-incompatible changes in Python 3 was introduced specifically to deal with situations of this ilk. In Python 3, a custom metaclass can define a __prepare__ function which runs before "step 1" in the above simplified list, and this lets it have more control about the class's body. Specifically, quoting PEP 3115...:
__prepare__ returns a dictionary-like object which is used to store
the class member definitions during evaluation of the class body.
In other words, the class body is evaluated as a function block
(just like it is now), except that the local variables dictionary
is replaced by the dictionary returned from __prepare__. This
dictionary object can be a regular dictionary or a custom mapping
type.
...
An example would be a metaclass that
uses information about the
ordering of member declarations to create a C struct. The metaclass
would provide a custom dictionary that simply keeps a record of the
order of insertions.
You don't want to "create a C struct" as in this example, but the order of fields is crucial (to allow the use of positional parameters that you want) and so the custom metaclass (obtained through base model) would have a __prepare__ classmethod returning an ordered dictionary. This removes the specific issue, but, of course, only if you're willing to switch all of your code using this "magic ORM" to Python 3. Would you be?
Once that's settled, the issue is, what database operations do you want to perform, and how. Your example, of course, does not clarify this at all. Is the taskList attribute name special, or should any other attribute assigned to the db object be "autosaved" (by name and, what other characteristic[s]?) and "autoretrieved" upon use? Are there to be ways to remove entities, alter them, locate them (otherwise than by having once been listed in the same attribute of the db object)? How does your sample code know what DB service to use and how to authenticate to it (e.g. by userid and password) if it requires authentication?
The specific tasks you list would not be hard to implement (e.g. on top of Google App Engine's storage service, which does not require authentication nor specification of "what DB service to use"). model's metaclass would introspect the class's fields and generate a GAE Model for the class, the db object would use __setattr__ to set an atexit trigger for storing the final value of an attribute (as an entity in a different kind of Model of course), and __getattr__ to fetch that attribute's info back from storage. Of course without some extra database functionality this all would be pretty useless;-).
Edit: so I did a little prototype (Python 2.6, and based on sqlite) and put it up on http://www.aleax.it/lustdb.zip -- it's a 3K zipfile including 225-lines lustdb.py (too long to post here) and two small test files roughly equivalent to the OP's originals: test0.py is...:
from lustdb import *
class Task(Model):
title = ''
done = False
db.taskList = []
db.taskList.append(Task(title='Beat old sql interfaces', done=False))
db.taskList.append(Task(title='Illustrate different syntax modes', done=True))
and test1.p1 is...:
from lustdb import *
print 'Done tasks:'
for task in db.taskList:
if task.done:
print task
Running test0.py (on a machine with a writable /tmp directory -- i.e., any Unix-y OS, or, on Windows, one on which a mkdir \tmp has been run at any previous time;-) has no output; after that, running test1.py outputs:
Done tasks:
Task(done=True, title=u'Illustrate different syntax modes')
Note that these are vastly less "crazily magical" than the OP's examples, in many ways, such as...:
1. no (expletive delete) redundancy whereby `db.taskList` is a synonym of `db['taskList']`, only the sensible former syntax (attribute-access) is supported
2. no mysterious (and totally crazy) way whereby a `done` attribute magically becomes `isDone` instead midway through the code
3. no mysterious (and utterly batty) way whereby a `print task` arbitrarily (or magically?) picks and prints just one of the attributes of the task
4. no weird gyrations and incantations to allow positional-attributes in lieu of named ones (this one the OP agreed to)
The prototype of course (as prototypes will;-) leaves a lot to be desired in many respects (clarity, documentation, unit tests, optimization, error checking and diagnosis, portability among different back-ends, and especially DB features beyond those implied in the question). The missing DB features are legion (for example, the OP's original examples give no way to identify a "primary key" for a model, or any other kinds of uniqueness constraints, so duplicates can abound; and it only gets worse from there;-). Nevertheless, for 225 lines (190 net of empty lines, comments and docstrings;-), it's not too bad in my biased opinion.
The proper way to continue playing with this project would of course be to initiate a new lustdb open source project on the hosting part of code.google.com (or any other good open source hosting site with issue tracker, wiki, code reviews support, online browsing, DVCS support, etc, etc) - I'd do it myself but I'm close to the limit in terms of number of open source projects I can initiate on code.google.com and don't want to "burn" the last one or two in this way;-).
BTW, the lustdb name for the module is a play of word with the OP's initials (first two letters each of first and last names), in the tradition of awk and friends -- I think it sounds nicely (and most other obvious names such as simpledb and dumbdb are taken;-).

I think you should try ZODB. It is object oriented database designed for storing python objects. Its API is quite close to example you have provided in your question, just take a look at tutorial.

What about using Elixir?

Forget ORM! I like vanilla SQL. The python wrappers like psycopg2 for postgreSQL do automatic type conversion, offer pretty good protection against SQL injection, and are nice and simple.
sql = "SELECT * FROM table WHERE id=%s"
data = (5,)
cursor.execute(sql, data)

The more I think on't the more the Smalltalk model of operation seems more relevant. Indeed the OP may not have reached far enough by using the term "database" to describe a thing which should have no need for naming.
A running Python interpreter has a pile of objects that live in memory. Their inter-relationships can be arbitrarily complex, but namespaces and the "tags" that objects are bound to are very flexible. And as pickle can explicitly serialize arbitrary structures for persistence, it doesn't seem that much of a reach to consider each Python interpreter living in that object space. Why should that object space evaporate with the interpreter's close? Semantically, this could be viewed as an extension of the anydbm tied dictionaries. And since most every thing in Python is dictionary-like, the mechanism is almost already there.
I think this may be the generic model that Alex Martelli was proposing above, it might be nice to say something like:
class Book:
def __init__(self, attributes):
self.attributes = attributes
def __getattr__(....): pass
$ python
>>> import book
>>> my_stuff.library = {'garp':
Book({'author': 'John Irving', 'title': 'The World According to Garp',
'isbn': '0-525-23770-4', 'location': 'kitchen table',
'bookmark': 'page 127'}),
...
}
>>> exit
[sometime next week]
$ python
>>> import my_stuff
>>> print my_stuff.library['garp'].location
'kitchen table'
# or even
>>> for book in my_stuff.library where book.location.contains('kitchen'):
print book.title
I don't know that you'd call the resultant language Python, but it seems like it is not that hard to implement and makes backing store equivalent to active store.
There is a natural tension between the inherent structure imposed - and sometimes desired - by RDBMs and the rather free-form navel-gazing put here, but NoSQLy databases are already approaching the content addressable memory model and probably better approximates how our minds keep track of things. Contrariwise, you wouldn't want to keep all the corporate purchase orders such a storage system, but perhaps you might.

How about you give an example of how "simple" you want your "dealing with database" to be, and I then tell you all the stuff that is needed for that "simplicity" to get working ?
(And of which it will still be YOU that will be required to give the information/config to the database interface engine, somewhere, somehow.)
To name but one example :
If your database management engine is some external machine with which you/your app interfaces over IP or some such, there is no way around the fact that the IP identity of where that database engine is running, will have to be provided by your app's database interface client, somewhere, somehow. Regardless of whether that gets explicitly exposed in the code or not.

I've been busy, here it is, released under LGPL:
http://github.com/lukestanley/lustdb
It uses JSON as it's backend at the moment.
This is not the same codebase Alex Martelli did.
I wanted to make the code more readable and reusable with different
backends and such.
Elsewhere I have been working on object oriented HTML elements
accessable in Python in similar ways, AND a library for making web.py
more minimalist.
I'm thinking of ways of using all 3 elements together with automatic
MVC prototype construction or smart mapping.
While old fashioned text based template web programming will be around
for a while still because of legacy systems and because it doesn't
require any particular library or implementation, I feel soon we'll
have a lot more efficent ways of building robust, prototype friendly
web apps.
Please see the mailing list for those interested.

If you like CherryPy, you might like the complementary ORMs I wrote: GeniuSQL (which follows a Table Data gateway model) and Dejavu (which is a complete Data Mapper).
There's far too much in this question and all its subcomments to address completely, but one thing I wanted to point out was that GeniuSQL and Dejavu have a very robust system for mapping native Python types to the types that your particular backend is using. There are very sane defaults, which can be overridden as needed, and even extended if you make a new backend or use types from a backend that isn't yet supported. See http://www.aminus.net/geniusql/chrome/common/doc/trunk/advanced.html#custom for more discussion on that.

Related

combining ms access vba codes

Me and my colleague are developing an ms access based application. We are designing and coding different pages/forms in order to divide work. We plan to merge our work later. How can we do that without any problems like spoiling the design and macros? We are using Ms access 2007 for front end and sqlserver 2005 as the datasource.
I found an idea somewhere on bytes.com. I can import forms, reports, queries,data and tables that I want.I'm going to try this. However, it's just an idea.So, need to study this approach by trial and error techniques.
The most important requirement is to complete the overall design before you start coding. For example:
All the forms must have the same style. Help and error information must be provided in the same way on each form. If a user can divide the forms into two sets, you have failed.
The database design must be finished with a complete, written description of each table, its relationships and its attributes.
The purpose and parameters for each major macro must be defined. If macro A1 exists only to service macro A then A1 is not a major macro and only A's author need know of its details until coding is complete.
Agreed a documentation style and detail level. If the application needs enhancement in six or twelve months' time, you should be able to work on the others macros and forms as easily as on your own.
If one of you thinks a change to the design is required after coding has started, this change must be documented, agreed with the other and the change specification added to the master specification.
Many years ago I lectured on (Electronic Data interchange (EDI). With EDI, the specification is divided into two with one set of organisations providing applications for message senders and another set providing applications for message receivers. I often used an example in my lectures to help my audience understand the importance of a complete, unambiguous specification.
I want two shapes, an E and a reverse-E, which I can fit together to create a 10 cm square. I do not care what they are made of providing they fit together perfectly.
If I give this task to a single organisation, this specification will be enough. One organisation might use cardboard, another metal, but I do not care. But suppose I ask one organisation to create the E and another the reverse-E. How detailed does my specification have to be if I am to get my 10 cm square? I would suggest: material, thickness and dimensions of the E. My audience would compete to suggest more and more obscure characteristics that had to match: density, colour, pattern, texture, etc, etc.
I was not always convinced my audience listened to the rest of my lecture because they were searching for a characteristic that would cap all the others. No matter, I had got across my major point which was why EDI specifications were no mind-blowingly detailed.
Your situation will not be so difficult since you and your colleague are probably in the same room and can talk whenever you want. But I hope this example helps you understand how easy is it for the interface between your two parts to be less than seamless if you do not agree the complete design at the beginning. It's the little assumptions - I though you knew I was doing it that way - that will kill your application.
New section
OK, probably most of my earlier advice was inappropriate in your situation.
So you are trying to modify code you did not write in a language you do not know. Good luck; you will need it.
I think scope is going to be your biggest problem. Most modern languages have namespaces allowing you to give a variable or a routine as much or as little scope as you require. VBA only has three levels.
A variable declared within a function or subroutine is automatically private to that function or subroutine.
A variable declared as Private within a module is invisible to functions and subroutines in other modules but is visible to any function or subroutine within the module.
A variable declared as Public within a module is visible to any function or subroutine within the project.
Anything declared within a form is private to that form. If a form wishes to pass a value to an outside function or subroutine, it can do so by writing to a public variable or by passing it in a parameter to a public function or subroutine.
Avoiding Naming Conflicts within VBA Help gives useful advice.
Form and module names will have to be unique across the merged project. You will not be able to avoid have constants, variables, functions and sub-routines which are visible to the other's functions and sub-routines. Avoiding Naming Conflicts offers one approach. An approach I have used successfully is to divide the application into sub-applications and, if necessary, sub-sub-applications and to assign a prefix to each. If every public constant, variable, function and sub-routine name has the appropriate prefix you can simulate namespace type control.

Class between database and UI

I have a class that handles writing and reading data from my database. What is a proper name to call this class?
There are a couple of conventions. Assuming a Person model, you could use:
PersonDataAccessObject,
PersonDao,
PersonRepository,
PersonDataAccess,
...
It is also dependent on the technology you are using. I mean, who knows what conventions exist for the language you are using. Let us know what language and what data access framework and the answer may vary.
I used to append "Dao" because it's short and clear. But then I moved over more to Martin Fowler's vocabulary and patterns, so now I use Repository. A little more long winded, but I'm long winded by nature, so it fits my style. In the end, that's the key. It's stylistic and there is no across the board standard that I'm aware of. What's most important is that you pick something that is clear and you use it consistently. If you decide, later on, to switch to something else, have mercy on any programmers that may follow you and rename everything so that all your data access components are consistently named.
Edit: in rereading this, I realized I am assuming you are going to have multiple such classes, one for each of your model entities. Who knows what your setup is. If you aren't going to do it like that, and you're just looking for a standard name for a single point of entry to all data access, you could use:
DataMapper
Gateway
Typically, the assumption is that you are going to have several of these around, one for each of your "tables"/model entities. More than a naming convention, that is probably a standard coding convention. This way, when you change or add some aspect of how you interact with your "persons" table, you don't have to modify a class in which you have code to access the "addresses" table. Check out Martin Fowler's Patterns of Enterprise Application Architecture (PofEAA), for more
PofEAA catalog of patterns (check out Data Source Architectural Patterns
and
Domain Driven Design Quickly (free pdf) esp. Ch. 3
Depending on the entity this class represents it could be for example Person. Then you design a PersonViewModel which is passed to the GUI. So the Person you got from the database is mapped to a PersonViewModel which is passed to the UI layer for being shown under some form. The view model is just a representation of the domain model you fetched from the database and containing only the necessary information that you need to display on the given UI.

Silverlight LINQtoSQL: one big dataclass, or several small ones?

I'm new to Silverlight, but being dumped right into the fray - good way to learn I suppose :o)
Anyway, the webapp I'm working on has a relatively complex database structure that represents various object types that are linked to each other, and I was wondering 2 things:
1- What is the recommended approach when it comes to dataclasses? Have just one big dataclass, or try and separate it into several smaller dataclasses, keeping in mind they will need to reference each other?
2- If the recommended approach is to have several dataclasses, how do you define the inter-dataclasses references?
I'm asking because I did a small test. In my DB (simplified here, real model is more complex but that's not important), I have a table "Orders" and a table "Parameters". "Orders" has a foreign key on "Parameters". What I did is create 2 dataclasses.
The first one, ParamClass, were I dropped the "Parameters" table only, so I can have a nice "parameter" class. I then created a simple service to add basic SELECT and INSERT functionality.
The second one, OrdersClass, where I dropped both tables, so that the relation between the tables would automatically create a "EntityRef<parameter>" variable inside the "order" class. I then removed the "parameters" class that was automatically created in the OrdersClass dataclass, since the class has already been declared in the ParamClass dataclass. Again I created a small service to test it.
So far so good, it builds happily. The problem is that when I try to handle things on the application code, I added service references for both dataclasses, but it is not happy doing something like:
OrdersServiceReference.order myOrder = new OrdersServiceReference.order();
myOrder.parameter = new ParamServiceReference.parameter(); //<-PROBLEM IS HERE
It comlpains that it cannot implicitly convert from type 'MytestDC.ParamServiceReference.parameter' to 'MytestDC.OrdersServiceReference.parameter'
Do I somehow need to declare some sort of reference to ParamClass from OrdersClass, or how do I "convert" one to the other?
Is this even a recommended and efficient way of doing this?
Since it's a team-project, I initially wanted to separate the dataclasses so that they (and their services) can be easily checked out by one member without checking out the whole entire dataclass.
Any help appreciated!
PS: using Silverlight 4, in case that's important
Based on the widely accepted Single Responsability Principle (SRP), a class should always be responsible for one task, and one task only.
That pretty much invalidates your "one big dataclass" approach.
I would always recommend smaller, more manageable bits that can be combined, instead of one humonguous class that does everything (except brew coffee for you).
Resources for the SRP:
Wikipedia on SRP
OODesign: Single Responsibility Principle
ObjectMentor: list of articles on good app design - which has a few links to PDF documents, like this one on SRP written by Robert C. Martin - the "guru" on proper OO design
OK, some more research let me to this: it is not simple to separate classes from a relational model using LINQtoSQL. I ended up switching to an Entity Framework approach, which itself doesn't deal with it gracefully (see here and there, for example), but at least it solved another major problem I had with LINQtoSQL.
There are other ORMs out there that are apparently much more capable at this (NHibernate comes up often in recommendations), unfortunately, I don't have time to investigate them now, being under such a tight deadline.
As for the referencing, it was quite simple, change the line to:
myOrder.parameter = new OrderServiceReference.parameter();
even though I removed the declaration from that dataclass.
Hope this helps someone!

Repository Pattern Question

I'm building an ASP.NET MVC app and I'm using a repository to store and retrieve view objects. My question is, is it okay for the implementation of the various repositories to call each other? I.E. can the ICustomerRepository implementation call an implementation of IAddressRepository, or should it handle its own updates to the address data source?
EDIT:
Thanks everyone, the the Customer/Address example isn't real. The actual problem involves three aggregates which update a fourth aggregate in response to changes in their respective states. I in this case it seems a conflict between introducing dependencies vs violating the don't repeat yourself principle.
You should have a repository for each aggregate root.
I have no knowledge of your domain-model, but it doesn't feel natural for me to have an IAddressRepository. Unless 'Address' is an aggregate root in your domain.
In fact, in most circumstances, 'Address' is not even an entity, but a value object. That is, in most cases the identity of an 'Address' is determined by its value (the value of all its properties); it does not have a separate 'Id' (key) field.
So, when this is the case, the CustomerRepository should be responsible for storing the Address, as the Address is a part of the Customer aggregate-root.
Edit (ok so your situation was just an example):
But, if you have other situations where you would need another repository in a repository, then I think that it is better to remove that functionality out of that repository, and put it in a separate class (Service).
I mean: I think that, if you have some functionality inside a repository A, that relies on another repository B, then this kind of functionality doesn't belong inside repository A.
Instead, write another class (which is called a Service in DDD), in where you implement this functionality.
Anyway, I do not think that repositories should really call each other. If you do not want to write a Service however, and if you really want to keep that logic inside the repository itself, then pass the other repository as an argument in that specific method.
I hope I made myself a bit clear. :P
They really shouldn't call each other. A Repository is an abstraction of the (more or less) atomic operations that you want to perform on your domain. The less dependency they have, the better. Realistically, any consumer of a repository should expect to be able to point the repository class at a database and have it perform the necessary domain operations without a lot of configuration.
They should also represent "aggregates" in your domain - i.e. key focal points that a lot of functionality will be based around. I'm wondering why you would have a separate address information repository? Shouldn't that be part of your customer repository?
This depends on the type of repository (or at least the consequences do) but in general if you have data repositories calling each other you're going to run into problems with things like cyclical (repo A -> requires B -> requires C -. oops, requires A) or recursive data loads (A-> requires B & C -> C-requires D, E -> .... ->..ad nauseum). Testing also becomes more difficult.
For example, you need to load your address repository to properly run your customer repository, because the customer repository calls the address repo. If you need to test the customer repo, you'll need to do a db load of the addresses or mock them in some way, and ultimately you won't be able to load and test any single system repository without loading them all.
Having those dependencies is also kind of insidious because they're often not clear - usually you're dealing with a repository as a data-holding abstraction - if you have to be conscious of how they depend on each other you can't use them as an abstraction, but have to manage the load process whenever you want to use them.

Is it bad practice to "go deep" with your application of callbacks?

Weird question, but I'm not sure if it's anti-pattern or not.
Say I have a web app that will be rendering 1000 records to an html table.
The typical approach I've seen is to send a query down to the database, translate the records in some way to some abstract state (be it an array, or a object, etc) and place the translated records into a collection that is then iterated over in the view.
As the number of records grows, this approach uses up more and more memory.
Why not send along with the query a callback that performs an operation on each of the translated rows as they are read from the database? This would mean that you don't need to collect the data for further iteration in the view so the memory footprint shrinks, and you're not iterating over the data twice.
There must be something implicitly wrong with this approach, because I rarely see it used anywhere. What's wrong with this approach?
Thanks.
Actually, this is exactly how a well-developed application should behave.
There is nothing wrong with this approach, except that not all database interfaces allow you to do this easily.
If we talk about tabularizing 10 records for a yet another social network, there is no need to mess with callbacks if you can get an array of hashes or whatever with a single call that is already implemented for you.
There must be something implicitly wrong with this approach, because I rarely see it used anywhere.
I use it. Frequently. Even when i wouldn't use too much memory repeatedly copying the data, using a callback just seems cleaner. In languages with closures, it also lets you keep relevant code together while factoring out the messy DB stuff.
This is a "limited by your tools" class of problem: Most programming languages don't allow to say "Do something around this code". This was solved in recent years with the advent of closures. Think of a closure as a way to pass code into another method which is then executed in a context. For example, in GSQL, you can write:
def l = []
sql.execute ("select id from table where time > ?", time) { row ->
l << row[0]
}
This will open a connection to the database, create a statement and a result set and then run the l << it[0] for each row the DB returns. Note that the code runs inside of sql.execute() but it can access local variables (l) and variables defined in sql.execute() (row).
With this kind of code, you can even generate the result of a HTTP request on the fly without keeping much of the page in RAM at any time. In my case, I'd stream a 2MB document to the browser using only a few KB of RAM and the browser would then chew 83s to parse this.
This is roughly what the iterator pattern allows you to do. In many cases this breaks down on the interface between your application and the database. Technologies like LINQ even have solutions that can send back code to the database.
I've found it easier to use an interface resolver than deep callback where its hooked up through several classes. MS has a much fancier version than mine called Unity. This provides a much cleaner way of accessing classes that should not be tightly coupled
http://www.codeplex.com/unity

Resources