Repository Pattern Question - database

I'm building an ASP.NET MVC app and I'm using a repository to store and retrieve view objects. My question is, is it okay for the implementation of the various repositories to call each other? I.E. can the ICustomerRepository implementation call an implementation of IAddressRepository, or should it handle its own updates to the address data source?
EDIT:
Thanks everyone, the the Customer/Address example isn't real. The actual problem involves three aggregates which update a fourth aggregate in response to changes in their respective states. I in this case it seems a conflict between introducing dependencies vs violating the don't repeat yourself principle.

You should have a repository for each aggregate root.
I have no knowledge of your domain-model, but it doesn't feel natural for me to have an IAddressRepository. Unless 'Address' is an aggregate root in your domain.
In fact, in most circumstances, 'Address' is not even an entity, but a value object. That is, in most cases the identity of an 'Address' is determined by its value (the value of all its properties); it does not have a separate 'Id' (key) field.
So, when this is the case, the CustomerRepository should be responsible for storing the Address, as the Address is a part of the Customer aggregate-root.
Edit (ok so your situation was just an example):
But, if you have other situations where you would need another repository in a repository, then I think that it is better to remove that functionality out of that repository, and put it in a separate class (Service).
I mean: I think that, if you have some functionality inside a repository A, that relies on another repository B, then this kind of functionality doesn't belong inside repository A.
Instead, write another class (which is called a Service in DDD), in where you implement this functionality.
Anyway, I do not think that repositories should really call each other. If you do not want to write a Service however, and if you really want to keep that logic inside the repository itself, then pass the other repository as an argument in that specific method.
I hope I made myself a bit clear. :P

They really shouldn't call each other. A Repository is an abstraction of the (more or less) atomic operations that you want to perform on your domain. The less dependency they have, the better. Realistically, any consumer of a repository should expect to be able to point the repository class at a database and have it perform the necessary domain operations without a lot of configuration.
They should also represent "aggregates" in your domain - i.e. key focal points that a lot of functionality will be based around. I'm wondering why you would have a separate address information repository? Shouldn't that be part of your customer repository?

This depends on the type of repository (or at least the consequences do) but in general if you have data repositories calling each other you're going to run into problems with things like cyclical (repo A -> requires B -> requires C -. oops, requires A) or recursive data loads (A-> requires B & C -> C-requires D, E -> .... ->..ad nauseum). Testing also becomes more difficult.
For example, you need to load your address repository to properly run your customer repository, because the customer repository calls the address repo. If you need to test the customer repo, you'll need to do a db load of the addresses or mock them in some way, and ultimately you won't be able to load and test any single system repository without loading them all.
Having those dependencies is also kind of insidious because they're often not clear - usually you're dealing with a repository as a data-holding abstraction - if you have to be conscious of how they depend on each other you can't use them as an abstraction, but have to manage the load process whenever you want to use them.

Related

React Simple Global Entity Cache instead of Flux/React/etc

I am writing a little "fun" Scala/Scala.js project.
On my server I have Entities which are referenced by uuid-s
(inside Ref-s).
For the sake of "fun", I don't want to use flux/redux architecture but still use React on the client (with ScalaJS-React).
What I am trying to do instead is to have a simple cache, for example:
when a React UserDisplayComponent wants the display the Entity User with uuid=0003
then the render() method calls to the Cache (which is passed in as a prop)
let's assume that this is the first time that the UserDisplayComponent asks for this particular User (with uuid=0003) and the Cache does not have it yet
then the Cache makes an AjaxCall to fetch the User from the server
when the AjaxCall returns the Cache triggers re-render
BUT ! now when the component is asking for the User from the Cache, it gets the User Entity from the Cache immediately and does not trigger an AjaxCall
The way I would like to implement this is the following :
I start a render()
"stuff" inside render() asks the Cache for all sorts of Entities
Cache returns either Loading or the Entity itself.
at the end of render the Cache sends all the AjaxRequest-s to the server and waits for all of them to return
once all AjaxRequests have returned (let's assume that they do - for the sake of simplicity) the Cache triggers a "re-render()" and now all entities that have been requested before are provided by the Cache right away.
of course it can happen that the newly arrived Entity-s will trigger the render() to fetch more Entity-s if for example I load an Entity that is for example case class UserList(ul: List[Ref[User]]) type. But let's not worry about this now.
QUESTIONS:
1) Am I doing something really wrong if I am doing the state handling this way ?
2) Is there an already existing solution for this ?
I looked around but everything was FLUX/REDUX etc... along these lines... - which I want to AVOID for the sake of :
"fun"
curiosity
exploration
playing around
I think this simple cache will be simpler for my use-case because I want to take the "REF" based "domain model" over to the client in a simple way: as if the client was on the server and the network would be infinitely fast and zero latency (this is what the cache would simulate).
Consider what issues you need to address to build a rich dynamic web UI, and what libraries / layers typically handle those issues for you.
1. DOM Events (clicks etc.) need to trigger changes in State
This is relatively easy. DOM nodes expose callback-based listener API that is straightforward to adapt to any architecture.
2. Changes in State need to trigger updates to DOM nodes
This is trickier because it needs to be done efficiently and in a maintainable manner. You don't want to re-render your whole component from scratch whenever its state changes, and you don't want to write tons of jquery-style spaghetti code to manually update the DOM as that would be too error prone even if efficient at runtime.
This problem is mainly why libraries like React exist, they abstract this away behind virtual DOM. But you can also abstract this away without virtual DOM, like my own Laminar library does.
Forgoing a library solution to this problem is only workable for simpler apps.
3. Components should be able to read / write Global State
This is the part that flux / redux solve. Specifically, these are issues #1 and #2 all over again, except as applied to global state as opposed to component state.
4. Caching
Caching is hard because cache needs to be invalidated at some point, on top of everything else above.
Flux / redux do not help with this at all. One of the libraries that does help is Relay, which works much like your proposed solution, except way more elaborate, and on top of React and GraphQL. Reading its documentation will help you with your problem. You can definitely implement a small subset of relay's functionality in plain Scala.js if you don't need the whole React / GraphQL baggage, but you need to know the prior art.
5. Serialization and type safety
This is the only issue on this list that relates to Scala.js as opposed to Javascript and SPAs in general.
Scala objects need to be serialized to travel over the network. Into JSON, protobufs, or whatever else, but you need a system for this that will not involve error-prone manual work. There are many Scala.js libraries that address this issue such as upickle, Autowire, endpoints, sloth, etc. Key words: "Scala JSON library", or "Scala type-safe RPC", depending on what kind of solution you want.
I hope these principles suffice as an answer. When you understand these issues, it should be obvious whether your solution will work for a given use case or not. As it is, you didn't describe how your solution addresses issues 2, 4, and 5. You can use some of the libraries I mentioned or implement your own solutions with similar ideas / algorithms.
On a minor technical note, consider implementing an async, Future-based API for your cache layer, so that it returns Future[Entity] instead of Loading | Entity.

Short lived DbContext in WPF application reasonable?

In his book on DbContext, #RowanMiller shows how to use the DbSet.Local property to avoid 1.) unnecessary roundtrips to the database and 2.) passing around collections (created with e.g. ToList()) in the application (page 24). I then tried to follow this approach. However, I noticed that from one using [} – block to another, the DbSet.Local property becomes empty:
ObservableCollection<Destination> destinationsList;
using (var context = new BAContext())
{
var query = from d in context.Destinations …;
query.Load();
destinationsList = context.Destinations.Local; //Nonzero here.
}
//Do stuff with destinationsList
using (var context = new BAContext())
{
//context.Destinations.Local zero here again;
//So no way of getting the in-memory data from the previous using- block here?
//Do I have to do another roundtrip to the database here to get the same data I wanted
//to cache locally???
}
Then, what is the point on page 24? How can I avoid the passing around of my collections if the DbSet.Local is only usable inside the using- block? Furthermore, how can I benefit from the change tracking if I use these short-lived context instances not handing over any cache data to each others under the hood? So, if the contexts should be short-lived for freeing resources such as connections, have I to give up the caching for this? I.e. I can’t use both at the same time (short-lived connections but long-lived cache)? So my only option would be to store the results returned by the query in my own variables, exactly what is discouraged in the motivation on page 24?
I am developing a WPF application which maybe will also become multi-tiered in the future, involving WCF. I know Julia has an example of this in her book, but I currently don’t have access to it. I found several others on the web, e.g. http://msdn.microsoft.com/en-us/magazine/cc700340.aspx (old ObjectContext, but good in explaining the inter-layer-collaborations). There, a long-lived context is used (although the disadvantages are mentioned, but no solution to these provided).
It’s not only that the single Destinations.Local gets lost, as you surely know all other entities fetched by the query are, too.
[Edit]:
After some more reading in Julia Lerman’s book, it seems to boil down to that EF does not have 2nd level caching per default; with some (considerable, I think) effort, however, ones can add 3rd party caching solutions, as is also described in the book and in various articles on MSDN, codeproject etc.
I would have appreciated if this problem had been mentioned in the section about DbSet.Local in the DbContext book that it is in fact a first level cache which is destroyed when the using {} block ends (just my proposal to make it more transparent to the readers). After first reading I had the impression DbSet.Local would always return the same reference (Singleton-style) also in the second using {} block despite the new DbContext instance.
But I am still unsure whether the 2nd level cache is the way to go for my WPF application (as Julia mentions the 2nd level cache in her article for distributed applications)? Or is the way to go to get my aggregate root instances (DDD, Eric Evans) of my domain model into memory by one or some queries in a using {} block, disposing the DbContext and only holding the references to the aggregate instances, this way avoiding a long-lived context? It would be great if you could help me with this decision.
http://msdn.microsoft.com/en-us/magazine/hh394143.aspx
http://www.codeproject.com/Articles/435142/Entity-Framework-Second-Level-Caching-with-DbConte
http://blog.3d-logic.com/2012/03/31/using-tracing-and-caching-provider-wrappers-with-codefirst/
The Local property provides a “local view of all Added, Unchanged, and Modified entities in this set”. Like all change tracking it is specific to the context you are currently using.
The DB Context is a workspace for loading data and preparing changes.
If two users were to add changes at the same time, they must not know of the others changes before they saved them. They may discard their prepared changes which suddenly would lead to problems for other other user as well.
A DB Context should be short lived indeed, but may be longer than super short when necessary. Also consider that you may not save resources by keeping it short lived if you do not load and discard data but only add changes you will save. But it is not only about resources but also about the DB state potentially changing while the DB Context is still active and has data loaded; which may be important to keep in mind for longer living contexts.
If you do not know yet all related changes you want to save into the database at once then I suggest you do not use the DB Context to store your changes in-memory but in a data structure in your code.
You can of course use entity objects for doing so without an active DB Context. This makes sense if you do not have another appropriate data class for it and do not want to create one, or decide preparing the changes in them make more sense. You can then use DbSet.Attach to attach the entities to a DB Context for saving the changes when you are ready.

App Engine Memcache Key Prefix Across Versions

Greetings!
I've got a Google App Engine Setup where memcached keys are prefixed with os.environ['CURRENT_VERSION_ID'] in order to produce a new cache on deploy, without having to flush the cache manually.
This was working just fine until it became necessary for development to run two versions of the application at the same time. This, of course, is yielding inconsistencies in caching.
I'm looking for suggestions as to how to prefix the keys now. Essentially, there needs to be a variable that changes across versions when any version is deployed. (Well, this isn't quite ideal, as the cache gets totally blown out.)
I was thinking of the following possibilities:
Make a RuntimeEnvironment entity that stores the latest cache prefix. Drawbacks: even if cached, slows down every request. Cannot be cached in memory, only in memcached, as deployment of other version may change it.
Use a per-entity version number. This yields very nice granularity in that the cache can stay warm for non-modified entities. The downside is we'd need to push to all versions when models are changed, which I want to avoid in order to test model changes out before deploying to production.
Forget key prefix. Global namespace for keys. Write a script to flush the cache on every deploy. This actually seems just as good as, if not better than, the first idea: the cache is totally blown in both scenarios, and this one avoids the overhead of the runtime entity.
Any thoughts, different ideas greatly appreciated!
The os.environ['CURRENT_VERSION_ID'] value will be different to your two versions, so you will have separate caches for each one (the live one, and the dev/testing one).
So, I assume your problem is that when you "deploy" a version, you do not want the cache from development/testing to be used? (otherwise, like Nick and systempuntoout, I'm confused).
One way of achieving this would be to use the domain/host header in the cache - since this is different for your dev/live versions. You can extract the host by doing something like this:
scheme, netloc, path, query, fragment = urlparse.urlsplit(self.request.url)
# Discard any port number from the hostname
domain = netloc.split(':', 1)[0]
This won't give particularly nice keys, but it'll probably do what you want (assuming I understood correctly).
There was a bit of confusion with how i worded the question.
I ended up going for a per-class hash of attributes. Take this class for example:
class CachedModel(db.Model):
#classmethod
def cacheVersion(cls):
if not hasattr(cls, '__cacheVersion'):
props = cls.properties()
prop_keys = sorted(props.keys())
fn = lambda p: '%s:%s' % (p, str(props[p].model_class))
string = ','.join(map(fn, prop_keys))
cls.__cacheVersion = hashlib.md5(string).hexdigest()[0:10]
return cls.__cacheVersion
#classmethod
def cacheKey(cls, key):
return '%s-%s' % (cls.cacheVersion(), str(key))
That way, when entities are saved to memcached using their cacheKey(...), they will share the cache only if the actual class is the same.
This also has the added benefit that pushing an update that does not modify a model, leaves all cache entries for that model intact. In other words, pushing an update no longer acts as flushing the cache.
This has the disadvantage of hashing the class once per instance of the webapp.
UPDATE on 2011-3-9: I changed to a more involved but more accurate way of getting the version. Turns out using __dict__ yielded incorrect results as its str representation includes pointer addresses. This new approach just considers the datastore properties.
UPDATE on 2011-3-14: So python's hash(...) is apparently not guaranteed to be equal between runs of the interpreter. Was getting weird cases where a different app engine instance was seeing different hashes. using md5 (which is faster than sha1 faster than sha256) for now. no real need for it to be crypto-secure. just need an ok hashfn. Will probably switch to use something faster, but for now i rather be bugfree. Also ensured keys were getting sorted, not the property objects.

Silverlight LINQtoSQL: one big dataclass, or several small ones?

I'm new to Silverlight, but being dumped right into the fray - good way to learn I suppose :o)
Anyway, the webapp I'm working on has a relatively complex database structure that represents various object types that are linked to each other, and I was wondering 2 things:
1- What is the recommended approach when it comes to dataclasses? Have just one big dataclass, or try and separate it into several smaller dataclasses, keeping in mind they will need to reference each other?
2- If the recommended approach is to have several dataclasses, how do you define the inter-dataclasses references?
I'm asking because I did a small test. In my DB (simplified here, real model is more complex but that's not important), I have a table "Orders" and a table "Parameters". "Orders" has a foreign key on "Parameters". What I did is create 2 dataclasses.
The first one, ParamClass, were I dropped the "Parameters" table only, so I can have a nice "parameter" class. I then created a simple service to add basic SELECT and INSERT functionality.
The second one, OrdersClass, where I dropped both tables, so that the relation between the tables would automatically create a "EntityRef<parameter>" variable inside the "order" class. I then removed the "parameters" class that was automatically created in the OrdersClass dataclass, since the class has already been declared in the ParamClass dataclass. Again I created a small service to test it.
So far so good, it builds happily. The problem is that when I try to handle things on the application code, I added service references for both dataclasses, but it is not happy doing something like:
OrdersServiceReference.order myOrder = new OrdersServiceReference.order();
myOrder.parameter = new ParamServiceReference.parameter(); //<-PROBLEM IS HERE
It comlpains that it cannot implicitly convert from type 'MytestDC.ParamServiceReference.parameter' to 'MytestDC.OrdersServiceReference.parameter'
Do I somehow need to declare some sort of reference to ParamClass from OrdersClass, or how do I "convert" one to the other?
Is this even a recommended and efficient way of doing this?
Since it's a team-project, I initially wanted to separate the dataclasses so that they (and their services) can be easily checked out by one member without checking out the whole entire dataclass.
Any help appreciated!
PS: using Silverlight 4, in case that's important
Based on the widely accepted Single Responsability Principle (SRP), a class should always be responsible for one task, and one task only.
That pretty much invalidates your "one big dataclass" approach.
I would always recommend smaller, more manageable bits that can be combined, instead of one humonguous class that does everything (except brew coffee for you).
Resources for the SRP:
Wikipedia on SRP
OODesign: Single Responsibility Principle
ObjectMentor: list of articles on good app design - which has a few links to PDF documents, like this one on SRP written by Robert C. Martin - the "guru" on proper OO design
OK, some more research let me to this: it is not simple to separate classes from a relational model using LINQtoSQL. I ended up switching to an Entity Framework approach, which itself doesn't deal with it gracefully (see here and there, for example), but at least it solved another major problem I had with LINQtoSQL.
There are other ORMs out there that are apparently much more capable at this (NHibernate comes up often in recommendations), unfortunately, I don't have time to investigate them now, being under such a tight deadline.
As for the referencing, it was quite simple, change the line to:
myOrder.parameter = new OrderServiceReference.parameter();
even though I removed the declaration from that dataclass.
Hope this helps someone!

Django Models / SQLAlchemy are bloated! Any truly Pythonic DB models out there?

"Make things as simple as possible, but no simpler."
Can we find the solution/s that fix the Python database world?
Update: A 'lustdb' prototype has been written by Alex Martelli - if you know any somewhat lightweight, high-level database libraries with multiple backends we could wrap in syntax sugar honey, please weigh in!
from someAmazingDB import *
#we imported a smart model class and db object which talk to database adapter/s
class Task (model):
title = ''
done = False #native types not a custom object we have to think about!
db.taskList = []
#or
db.taskList = expandableTypeCollection(Task) #not sure what this syntax would be
db['taskList'].append(Task(title='Beat old sql interfaces',done=False))
db.taskList.append(Task('Illustrate different syntax modes',True)) # ok maybe we should just use kwargs
#at this point it should be autosaved to a default db option
#by default we should be able to reload the console and access the default db:
>> from someAmazingDB import *
>> print 'Done tasks:'
>> for task in db.taskList:
>> if task.done:
>> print task.title
'Illustrate different syntax modes'
I'm a fan of Python, webPy and Cherry Py, and KISS in general.
We're talking automatic Python to SQL type translation or NoSQL.
We don't have to totally be SQL compatible! Just a scalable subset or ignore it!
Re:model changes, it's ok to ask the developer when they try to change it or have a set of sensible defaults.
Here is the challenge: The above code should work with very little modification or thinking required. Why must we put up with compromise when we know better?
It's 2010, we should be able to code scalable, simple databases in our sleep.
If you think this is important, please upvote!
What you request cannot be done in Python 2.whatever, for a very specific reason. You want to write:
class Task(model):
title = ''
isDone = False
In Python 2.anything, whatever model may possibly be, this cannot ever allow you to predict any "ordering" for the two fields, because the semantics of a class statement are:
execute the body, thus preparing a dict
locate the metaclass and run special methods thereof
Whatever the metaclass may be, step 1 has destroyed any predictability of the fields' order.
Therefore, your desired use of positional parameters, in the snippet:
Task('Illustrate different syntax modes', True)
cannot associate the arguments' values with the model's various fields. (Trying to guess by type association -- hoping no two fields ever have the same type -- would be even more horribly unpythonic than your expressed desire to use db.tasklist and db['tasklist'] indifferently and interchangeably).
One of the backwards-incompatible changes in Python 3 was introduced specifically to deal with situations of this ilk. In Python 3, a custom metaclass can define a __prepare__ function which runs before "step 1" in the above simplified list, and this lets it have more control about the class's body. Specifically, quoting PEP 3115...:
__prepare__ returns a dictionary-like object which is used to store
the class member definitions during evaluation of the class body.
In other words, the class body is evaluated as a function block
(just like it is now), except that the local variables dictionary
is replaced by the dictionary returned from __prepare__. This
dictionary object can be a regular dictionary or a custom mapping
type.
...
An example would be a metaclass that
uses information about the
ordering of member declarations to create a C struct. The metaclass
would provide a custom dictionary that simply keeps a record of the
order of insertions.
You don't want to "create a C struct" as in this example, but the order of fields is crucial (to allow the use of positional parameters that you want) and so the custom metaclass (obtained through base model) would have a __prepare__ classmethod returning an ordered dictionary. This removes the specific issue, but, of course, only if you're willing to switch all of your code using this "magic ORM" to Python 3. Would you be?
Once that's settled, the issue is, what database operations do you want to perform, and how. Your example, of course, does not clarify this at all. Is the taskList attribute name special, or should any other attribute assigned to the db object be "autosaved" (by name and, what other characteristic[s]?) and "autoretrieved" upon use? Are there to be ways to remove entities, alter them, locate them (otherwise than by having once been listed in the same attribute of the db object)? How does your sample code know what DB service to use and how to authenticate to it (e.g. by userid and password) if it requires authentication?
The specific tasks you list would not be hard to implement (e.g. on top of Google App Engine's storage service, which does not require authentication nor specification of "what DB service to use"). model's metaclass would introspect the class's fields and generate a GAE Model for the class, the db object would use __setattr__ to set an atexit trigger for storing the final value of an attribute (as an entity in a different kind of Model of course), and __getattr__ to fetch that attribute's info back from storage. Of course without some extra database functionality this all would be pretty useless;-).
Edit: so I did a little prototype (Python 2.6, and based on sqlite) and put it up on http://www.aleax.it/lustdb.zip -- it's a 3K zipfile including 225-lines lustdb.py (too long to post here) and two small test files roughly equivalent to the OP's originals: test0.py is...:
from lustdb import *
class Task(Model):
title = ''
done = False
db.taskList = []
db.taskList.append(Task(title='Beat old sql interfaces', done=False))
db.taskList.append(Task(title='Illustrate different syntax modes', done=True))
and test1.p1 is...:
from lustdb import *
print 'Done tasks:'
for task in db.taskList:
if task.done:
print task
Running test0.py (on a machine with a writable /tmp directory -- i.e., any Unix-y OS, or, on Windows, one on which a mkdir \tmp has been run at any previous time;-) has no output; after that, running test1.py outputs:
Done tasks:
Task(done=True, title=u'Illustrate different syntax modes')
Note that these are vastly less "crazily magical" than the OP's examples, in many ways, such as...:
1. no (expletive delete) redundancy whereby `db.taskList` is a synonym of `db['taskList']`, only the sensible former syntax (attribute-access) is supported
2. no mysterious (and totally crazy) way whereby a `done` attribute magically becomes `isDone` instead midway through the code
3. no mysterious (and utterly batty) way whereby a `print task` arbitrarily (or magically?) picks and prints just one of the attributes of the task
4. no weird gyrations and incantations to allow positional-attributes in lieu of named ones (this one the OP agreed to)
The prototype of course (as prototypes will;-) leaves a lot to be desired in many respects (clarity, documentation, unit tests, optimization, error checking and diagnosis, portability among different back-ends, and especially DB features beyond those implied in the question). The missing DB features are legion (for example, the OP's original examples give no way to identify a "primary key" for a model, or any other kinds of uniqueness constraints, so duplicates can abound; and it only gets worse from there;-). Nevertheless, for 225 lines (190 net of empty lines, comments and docstrings;-), it's not too bad in my biased opinion.
The proper way to continue playing with this project would of course be to initiate a new lustdb open source project on the hosting part of code.google.com (or any other good open source hosting site with issue tracker, wiki, code reviews support, online browsing, DVCS support, etc, etc) - I'd do it myself but I'm close to the limit in terms of number of open source projects I can initiate on code.google.com and don't want to "burn" the last one or two in this way;-).
BTW, the lustdb name for the module is a play of word with the OP's initials (first two letters each of first and last names), in the tradition of awk and friends -- I think it sounds nicely (and most other obvious names such as simpledb and dumbdb are taken;-).
I think you should try ZODB. It is object oriented database designed for storing python objects. Its API is quite close to example you have provided in your question, just take a look at tutorial.
What about using Elixir?
Forget ORM! I like vanilla SQL. The python wrappers like psycopg2 for postgreSQL do automatic type conversion, offer pretty good protection against SQL injection, and are nice and simple.
sql = "SELECT * FROM table WHERE id=%s"
data = (5,)
cursor.execute(sql, data)
The more I think on't the more the Smalltalk model of operation seems more relevant. Indeed the OP may not have reached far enough by using the term "database" to describe a thing which should have no need for naming.
A running Python interpreter has a pile of objects that live in memory. Their inter-relationships can be arbitrarily complex, but namespaces and the "tags" that objects are bound to are very flexible. And as pickle can explicitly serialize arbitrary structures for persistence, it doesn't seem that much of a reach to consider each Python interpreter living in that object space. Why should that object space evaporate with the interpreter's close? Semantically, this could be viewed as an extension of the anydbm tied dictionaries. And since most every thing in Python is dictionary-like, the mechanism is almost already there.
I think this may be the generic model that Alex Martelli was proposing above, it might be nice to say something like:
class Book:
def __init__(self, attributes):
self.attributes = attributes
def __getattr__(....): pass
$ python
>>> import book
>>> my_stuff.library = {'garp':
Book({'author': 'John Irving', 'title': 'The World According to Garp',
'isbn': '0-525-23770-4', 'location': 'kitchen table',
'bookmark': 'page 127'}),
...
}
>>> exit
[sometime next week]
$ python
>>> import my_stuff
>>> print my_stuff.library['garp'].location
'kitchen table'
# or even
>>> for book in my_stuff.library where book.location.contains('kitchen'):
print book.title
I don't know that you'd call the resultant language Python, but it seems like it is not that hard to implement and makes backing store equivalent to active store.
There is a natural tension between the inherent structure imposed - and sometimes desired - by RDBMs and the rather free-form navel-gazing put here, but NoSQLy databases are already approaching the content addressable memory model and probably better approximates how our minds keep track of things. Contrariwise, you wouldn't want to keep all the corporate purchase orders such a storage system, but perhaps you might.
How about you give an example of how "simple" you want your "dealing with database" to be, and I then tell you all the stuff that is needed for that "simplicity" to get working ?
(And of which it will still be YOU that will be required to give the information/config to the database interface engine, somewhere, somehow.)
To name but one example :
If your database management engine is some external machine with which you/your app interfaces over IP or some such, there is no way around the fact that the IP identity of where that database engine is running, will have to be provided by your app's database interface client, somewhere, somehow. Regardless of whether that gets explicitly exposed in the code or not.
I've been busy, here it is, released under LGPL:
http://github.com/lukestanley/lustdb
It uses JSON as it's backend at the moment.
This is not the same codebase Alex Martelli did.
I wanted to make the code more readable and reusable with different
backends and such.
Elsewhere I have been working on object oriented HTML elements
accessable in Python in similar ways, AND a library for making web.py
more minimalist.
I'm thinking of ways of using all 3 elements together with automatic
MVC prototype construction or smart mapping.
While old fashioned text based template web programming will be around
for a while still because of legacy systems and because it doesn't
require any particular library or implementation, I feel soon we'll
have a lot more efficent ways of building robust, prototype friendly
web apps.
Please see the mailing list for those interested.
If you like CherryPy, you might like the complementary ORMs I wrote: GeniuSQL (which follows a Table Data gateway model) and Dejavu (which is a complete Data Mapper).
There's far too much in this question and all its subcomments to address completely, but one thing I wanted to point out was that GeniuSQL and Dejavu have a very robust system for mapping native Python types to the types that your particular backend is using. There are very sane defaults, which can be overridden as needed, and even extended if you make a new backend or use types from a backend that isn't yet supported. See http://www.aminus.net/geniusql/chrome/common/doc/trunk/advanced.html#custom for more discussion on that.

Resources