What is the best practice for persistence right now? - database

I come from a java background.
But I would like a cross-platform perspective on what is considered best practice for persisting objects.
The way I see it, there are 3 camps:
ORM camp
direct query camp e.g. JDBC/DAO, iBatis
LINQ camp
Do people still handcode queries (bypassing ORM) ? Why, considering the options available via JPA, Django, Rails.

There is no one best practice for persistence (although the number of people screaming that ORM is best practice might lead you to believe otherwise). The only best practice is to use the method that is most appropriate for your team and your project.
We use ADO.NET and stored procedures for data access (though we do have some helpers that make it very fast to write such as SP class wrapper generators, an IDataRecord to object translator, and some higher order procedures encapsulating common patterns and error handling).
There are a bunch of reasons for this which I won't go into here, but suffice to say that they are decisions that work for our team and that our team agrees with. Which, at the end of the day, is what matters.

I am currently reading up on persisting objects in .net. As such I cannot offer a best practice, but maybe my insights can bring you some benefit. Up until a few months ago I have always used handcoded queries, a bad habit from my ASP.classic days.
Linq2SQL - Very lightweight and easy to get up to speed. I love the strongly typed querying possibilities and the fact that the SQL is not executed at once. Instead it is executed when your query is ready (all the filters applied) thus you can split the data access from the filtering of the data. Also Linq2SQL lets me use domain objects that are separate from the data objects which are dynamically generated. I have not tried Linq2SQL on a larger project but so far it seems promising. Oh it only supports MS SQL which is a shame.
Entity Framework - I played around with it a little bit and did not like it. It seems to want to do everything for me and it does not work well with stored procedures. EF supports Linq2Entities which again allows strongly typed queries. I think it is limited to MS SQL but I could be wrong.
SubSonic 3.0 (Alpha) - This is a newer version of SubSonic which supports Linq. The cool thing about SubSonic is that it is based on template files (T4 templates, written in C#) which you can easily modify. Thus if you want the auto-generated code to look different you just change it :). I have only tried a preview so far but will look at the Alpha today. Take a look here SubSonic 3 Alpha. Supports MS SQL but will support Oracle, MySql etc. soon.
So far my conclusion is to use Linq2SQL until SubSonic is ready and then switch to that since SubSonics templates allows much more customization.

There is at least another one: System Prevalence.
As far as I can tell, what is optimal for you depends a lot on your circumstances. I could see how for very simple systems, using direct queries still could be a good idea. Also, I have seen Hibernate fail to work well with complex, legacy database schemata, so using an ORM might not always be a valid option. System Prevalence is supposed to unbeatingly fast, if you have enough memory to fit all your objects into RAM. Don't know about LINQ, but I suppose it has its uses, too.
So, as so often, the answer is: know a variety of tools for the job, so that you are able to use the one that's most appropriate for your specific situation.

The best practice depends on your situation.
If you need database objects in table structures with some sort of meaningful structure (so one column per field, one row per entity and so on) you need some sort of translation layer inbetween objects and the database. These fall into two camps:
If there's no logic in the database (just storage) and tables map to objects well, then an ORM solution can provide a quick and reliable persistence system. Java systems like Toplink and Hibernate are mature technologies for this.
If there is database logic involved in persistence, or your database schema has drifted from your object model significantly, stored procedures wrapped by Data Access Objects (with further patterns as you like) is a little more involved than ORM but more flexible.
If you don't need structured storage (and you need to be really sure that you don't, as introducing it to existing data is not fun), you can store serialized object graphs directly in the database, bypassing a lot of complexity.

I prefer to write my own SQL, but I apply all my refactoring techniques and other "good stuff" when I do so.
I have written data access layers, ORM code generators, persistence layers, UnitOfWork transaction management, and LOTS of SQL. I've done that in systems of all shapes and sizes, including extremely high-performance data feeds (forty thousand files totaling forty million transactions per day, each loaded within two minutes of real-time).
The most important criteria is destiny, as in control thereof. Don't ever let your ORM tool be an obstacle to getting your work done, or an excuse for not doing it right. Ultimately, all good SQL is hand-written and hand-tuned, but some decent tools can help you get a good first draft quickly.
I treat this issue the same way that I do my UI design. I write all my UIs directly in code, but I might use a visual designer to prototype some essential elements that I have in mind, then I tear apart the code it generates in order to kickstart my own.
So, use an ORM tool in any of its manifestations as a way to get a decent example--look at how it solves many of the issues that arise (key generation, associations, navigation, etc.). Tear apart its output, make it your own, then reuse the heck out of it.

Related

ORM vs traditional database query, which are their fields?

ORM seems to be a fast-growing model, with both pros and cons in their side. From Ultra-Fast ASP.NET of Richard Kiessig (http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839/ref=pd_bxgy_b_text_b):
"I love them because they allow me to develop small, proof-of-concept sites extremely quickly. I can side step much of the SQL and related complexity that I would otherwise need and focus on the objects, business logic and presentation. However, at the same time, I also don't care for them because, unfortunately, their performance and scalability is usually very poor, even when they're integrated with a comprehensive caching system (the reason for that becomes clear when you realize that when properly configured, SQL Server itself is really just a big data cache"
My questions are:
What is your comment about Richard's idea. Do you agree with him or not? If not, please tell why.
What is the best suitable fields for ORM and traditional database query? in other words, where you should use ORM and where you should use traditional database query :), which kind/size... of applications you should undoubtedly choose ORM/traditional database query
Thanks in advance
I can't agree to the common complain about ORMs that they perform bad. I've seen many plain-SQL applications until now. While it is theoretically possible to write optimized SQL, in reality, they ruin all the performance gain by writing not optimized business logic.
When using plain SQL, the business logic gets highly coupled to the db model and database operations and optimizations are up to the business logic. Because there is no oo model, you can't pass around whole object structures. I've seen many applications which pass around primary keys and retrieve the data from the database on each layer again and again. I've seen applications which access the database in loops. And so on. The problem is: because the business logic is already hardly maintainable, there is no space for any more optimizations. Often when you try to reuse at least some of your code, you accept that it is not optimized for each case. The performance gets bad by design.
An ORM usually doesn't require the business logic to care too much about data access. Some optimizations are implemented in the ORM. There are caches and the ability for batches. This automatic (and runtime-dynamic) optimizations are not perfect, but they decouple the business logic from it. For instance, if a piece of data is conditionally used, it loads it using lazy loading on request (exactly once). You don't need anything to do to make this happen.
On the other hand, ORM's have a steep learning curve. I wouldn't use an ORM for trivial applications, unless the ORM is already in use by the same team.
Another disadvantage of the ORM is (actually not of the ORM itself but of the fact that you'll work with a relational database an and object model), that the team needs to be strong in both worlds, the relational as well as the oo.
Conclusion:
ORMs are powerful for business-logic centric applications with data structures that are complex enough that having an OO model will advantageous.
ORMs have usually a (somehow) steep learning curve. For small applications, it could get too expensive.
Applications based on simple data structures, having not much logic to manage it, are most probably easier and straight forward to be written in plain sql.
Teams with a high level of database knowledge and not much experience in oo technologies will most probably be more efficient by using plain sql. (Of course, depending on the applications they write it could be recommendable for the team to switch the focus)
Teams with a high level of oo knowledge and only basic database experience are most probably more efficient by using an ORM. (same here, depending on the applications they write it could be recommendable for the team to switch the focus)
ORM is pretty old, at least in the Java world.
Major problems with ORM:
Object-Oriented model and Relational model are quite different.
SQL is a high level language to access data based on relational algebra, different from any OO language like C#, Java or Visual Basic.Net. Mixing those can you the worst of two worlds, instead of the best
For more information search the web on things like 'Object-relational impedance mismatch'
Either case, a good ORM framework saves you on quite some boiler-plate code. But you still need to have knowlegde of SQL, how to setup a good SQL databasemodel. Start with creating a good databasemodel using SQL, then base your OO model on that (not the other way around)
However, the above only holds if you really need to use a SQL database. I recommend looking into NoSQL movement as well. There's stuff like Cassandra, Couch-db. While google'ing for .net solutions I found this stackoverflow question: https://stackoverflow.com/questions/1777103/what-nosql-solutions-are-out-there-for-net
I'm the author of the book with the text quoted in the question.
Let me emphatically add that I am not arguing against using business objects or object oriented programming.
One issue I have with conventional ORM -- for example, LINQ to SQL or Entity Framework -- is that it often leads to developers making DB calls when they don't even realize that they're doing so. This, in turn, is a performance and scalability killer.
I review lots of websites for performance issues, and have found that DB chattiness is one of the most common causes of serious problems. Unfortunately, ORM tends to encourage chattiness, in spades.
The other complaints I have about ORM include:
No support for command batching
No support for multiple result sets
No support for table valued parameters
No support for native async calls (making them from a background thread doesn't count)
Support for SqlDependency and SqlCacheDependency is klunky if/when it works at all
I have no objection to using ORM tactically, to address specific business issues. But I do object to using it haphazardly, to the point where developers do things like make the exact same DB call dozens of time on the same page, or issue hugely expensive queries without considering caching and change notifications, or totally neglect async operations when scalability is a concern.
This site uses Linq-to-SQL I believe, and it's 'fairly' high traffic... I think that the time you save from writing the boiler plate code to access/insert/update simple items is invaluable, but there is always the option to drop down to calling a SPROC if you have something more complex, where you know you can write some screaming fast SQL directly.
I don't think that these things have to be mutually exclusive - use the advantages of both, and if there are sections of your application that start to slow down, then you can optimise as you need to.
ORM is far older than both Java and .NET. The first one I knew about was TopLink for Smalltalk. It's an idea as old as persistent objects.
Every "CRUD on the web" framework like Ruby on Rails, Grails, Django, etc. uses ORM for persistence because they all presume that you are starting with a clean sheet object model: no legacy schema to bother with. You start with the objects to model your problem and generate the persistence from it.
It often works the other way with legacy systems: the schema is long-lived, and you may or may not have objects.
It's astonishing how quickly you can get a prototype up and running with "CRUD on the web" frameworks, but I don't see them being used to develop enterprise apps in large corporations. Maybe that's a Fortune 500 prejudice.
Database admins that I know tell me they don't like the SQL that ORMs generate because it's often inefficient. They all wish for a way to hand-tune it.
I agree with most points already made here.
ORM's are not new in .NET, LLBLGen has been around for a long time, I've been using them for >5 years now in .NET.
I've seen very bad performing code written without ORMs (in-efficient SQL queries, bad indexes, nested database calls - ouch!) and bad code written with ORMs - I'm sure I've contributed to some of the bad code too :)
What I would add is that an ORM is generally a powerful and productivity-enhancing tool that allows you to stop worrying about plumbing db code for most of your application and concentrate on the application itself. When you start trying to write complex code (for example reporting pages or complex UI's) you need to understand what is happening underneath the hood - ignorance can be very costly. But, used properly, they are immensely powerful, and IMO won't have a detrimental effect on your apps performance. I for one wouldn't be happy on a project that didn't use an ORM.
Programming is about writing software for business use. The more we can focus on business logic and presentation and less with technicalities that only matter at certain points in time (when software goes down, when software needs upgrading, etc), the better.
Recently I read about talks of scalability from a Reddit founder, from here, and one line of him that caught my attention was this:
"Having to deal with the complexities
of relational databases (relations,
joins, constraints) is a thing of the
past."
From what I have watched, maintaining a complex database schema, when it comes to scalability, becomes a major pain as the site grows (you add a field, you reassign constraints, re-map foreign keys...etc). It was not entirely clear to me as to why is that. They're not using a NOSQL database though, they're in Postgres.
Add to that, here comes ORM, another layer of abstraction. It simplifies code writing, but almost often at a performance penalty. For me, a simple database abstraction library will do, much like lightweight AR libs out there together with database-specific "plain text" queries. I can't show you any benchmark but with the ORMs I have seen, most of them say that "ORM can often be slow".
Richard covers both sides of the coin, so I agree with him.
As for the fields, I really don't quite get the context of the "fields" you are asking about.
As others have said, you can write underperforming ORM code, and you can also write underperforming SQL.
Using ORM doesn't excuse you from knowing your SQL, and understanding how a query fits together. If you can optimize a SQL query, you can usually optimize an ORM query. For example, hibernate's criteria and HQL queries let you control which associations are joined to improve performance and avoid additional select statements. Knowing how to create an index to improve your most common query can make or break your application performance.
What ORM buys you is uniform, maintainable database access. They provide an extra layer of verification to ensure that your OO code matches up as closely as possible with your database access, and prevent you from making certain classes of stupid mistake, like writing code that's vulnerable to SQL injection. Of course, you can parameterize your own queries, but ORM buys you that advantage without having to think about it.
Never got anything but pain and frustration from ORM packages. If I'd write my SQL the way they autogen it - yeah I'd claim to be fast while my code would be slow :-) Have you ever seen SQL generated by an ORM ? Barely has PK-s, uses FK-s only for misguided interpretation of "inheritance" and if it wants to do paging it dumps the whole recordset on you and then discards 90% of it :-))) Then it locks everything in sight since it has to take in a load of records like it went back to 50 yr old IBM's batch processing.
For a while I thought that the biggest problem with ORM was splintering (not going to have a standard in 50 yrs - every year different API, pardon "model" :-) and ideologizing (everyone selling you a big philosophy - always better than everyone else's of course :-) Then I realized that it was really the total amateurism that's the root cause of the mess and everything else is just the consequence.
Then it all started to make sense. ORM was never meant to be performant or reliable - that wasn't even on the list :-) It was academic, "conceptual" toy from the day one, the consolation prize for professors pissed off that all their "relational" research papers in Prolog went down the drain when IBM and Oracle started selling that terrible SQL thing and making a buck :-)
The closest I came to trusting one was LINQ but only because it's possible and quite easy to kick out all "tracking" and use is just as deserialization layer for normal SQL code. Then I read how the object that's managing connection can develop spontaneous failures that sounded like premature GC while it still had some dangling stuff around. No way I was going to risk my neck with it after that - nope, not my head :-)
So, let me make a list:
Totally sloppy code - not going to suffer bugs and poor perf
Not going to take deadlocks from ORM's 10-100 times longer "transactions"
Drastic reduction of capabilities - SQL has huge expressive power these days
Tying you up into fringe and sloppy API (every ORM aims to hijack your codebase)
SQL queries are highly portable and SQL knowledge is totally portable
I still have to know SQL just to clean up ORM's mess anyway
For "proof-of-concept" I can just serialize to binary or XML files
not much slower, zero bug libraries and one XPath can select better anyway
I've actually done heavy traffic web sites all from XML files
if I actually need real graph then I have no use for DB - nothing real to query
I can serialize a blob and dump into SQL in like 3 lines of code
If someone claims that he does it all from DB to UI - keep your codebase locked :-)
and backup your payroll DB - you'll thank me latter :-)))
NoSQL bases are more honest than ORM - "we specialize in persistence"
and have better code quality - not surprised at all
That would be the short list :-) BTW, modern SQL engines these days do trees and spatial indexing, not to mention paging without a single record wasted. ORM-s are actually "solving" problems of 10yrs ago and promoting amateurism. To that extent NoSQL, also known as document

How to create a proper database layer?

Currently I use a lot of the ado.net classes (SqlConnection, SqlCommand, SqlDataAdapter etc..) to make calls to our sql server. Is that a bad idea? I mean it works. However I see many articles that use an ORM, nHibernate, subsonic etc.. to connect to SQL. Why are those better? I am just trying to understand why I would need to change this at all?
Update:
I did check the following tutorial on using nHibernate with stored-procedures.
http://ayende.com/Blog/archive/2006/09/18/UsingNHibernateWithStoredProcedures.aspx
However it looks to me that this is way to overkill. Why would I have to create a mapping file? Even if I create a mapping file and lets say my table changes, then my code wont work anymore. However if I use ado.net to return a simple datatable then my code will still work. I am missing something here?
There's nothing wrong with using the basic ADO.NET classes.
You might just have to do a lot more manual work than necessary. If you e.g. select your top 10 customers from a table with SqlCommand and SqlDataReader, it's up to you go iterate over the results, pull out each and every single item of data (like customer number, customer name, and so forth), and you're dealing very closely with the database structures, e.g. rows and columns. That's fine for some scenarios, but too much work in others.
What an ORM gives you is a lot of this "grunt work" being handled for you. You just tell it to get a list of your top 10 customers - as "Customer" objects. The ORM will go off and grab the data (most likely using SqlCommand, SqlDataReader) and then pulling out the bits and pieces, and assemble nice, easy to use "Customer" objects for you, that are a lot easier to use, since they are what your code is dealing with - Customer objects.
So there's definitely nothing wrong with using ADO.NET and it's a good thing if you know how it works - but an ORM can save you a lot of tedious, repetitive and boring grunt work and let you focus on your real business problems on the object level.
Marc
First of all, the ORMs are likely to do a much better job at producing the SQL queries than your normal non-SQL specialized Joe :)
Secondly, ORMs are a great way to somewhat "standardize" your DALs, increasing flexibility over different projects.
And lastly, with a good ORM, you're likely to have an easier time substituting your underlaying data-source, as a good ORM will have many different dialects. Of course, this is just a side-bonus :)
ORM's are great to avoid code repetition. You can often find that your object model and database model are extremely close to each other and whenever you add a field you'll be adding it to the database, your objects, your sql statements as well as everywhere else. If you use an ORM then you change your code in one place and it builds the rest of it for you.
As for performance, this can go either way. You will probably find that a lot of the simple sql that is written for you is often extremely tailored with various shortcuts that you would have been too lazy to write, such as only returning the absolutely required data. On the other hand, if you have some extremely complex queries and joins that an automated system could not possibly build then you're better of keeping these written yourself.
In summary though, they're fantastic for fast builds!
You don't need to change. If SqlConnection, SqlCommand, etc. work for you then that's great.
They work just peachy fine for the DB app I'm developing, and I have dozens of concurrent users with no problems.
There's nothing wrong with using straight up ADO.Net, but using an ORM will save you time, both in development and maintenance. Thats the biggest benefit.
One thing to consider: will a future "new developer" be more inclined to know or learn a well documented and widely adopted OR/M or your custom data access layer?
The number one thing for me though is the time. Minutes with my favorite OR/M, nHibernate vs. hours/days writing a custom data access layer using ADO.NET.
I also favor OR/Ms because maintaining declarative XML mappings is way easier than maintaining potentially thousands of lines of imperative code... or worse thousands of lines of C# data access code on top of thousands of lines of stored procedure code. In my current project I have 58 objects mapped in 58 XML mapping files, each with less than 50 lines. I cringe when I think about writing/maintaining CRUD code for 58 entities in ADO.NET.
I must warn you to read the documentation. Many, dare I say most, folks with whom I've worked will jump on a tool like mice on cheese, but they'll never read the documentation and learn the technology. I recommend reading the docs BEFORE moving to a new technology like nHibernate. A good cup o' jo and an hour or two of hard reading before-hand will pay dividends.
I haven't find pointing to an general idea of ORM in any answer. The general idea of ORM is to perform an Object-Relational mapping and provide your business classes with persistence. It means that you will think only about you business logic and will let ORM tool to save its state for you. Sure there are a lot of different scenarios. As was already said, it is nothing bad in using pure ADO.NET and may be your application (that is already written in this stile won't get any benefit), but using ORM tool in new projects is a very good idea. As for other - I totally agree with other answers.

Data Dictionary or ORM?

I am in the processes of replacing the framework for a fairly complex business web application. Our application runs on a LAMP platform and the new framework will be an extension of CodeIgniter. In my research for framework design I decided to look into ORM, I have never done ORM before and I wanted to know if it would be valuable for our application. Then I stumbled on a very interesting blog post entitled "Why I Do Not Use ORM." This blog seemed to confirm many of my worries about using ORM and it also presented a solution similar to what I was already planning.
By "data dictionary" I plan to use this definition from "The Database Programmer" blog:
The term "data dictionary" is used by many, including myself, to denote a separate set of tables that describes the application tables. The Data Dictionary contains such information as column names, types, and sizes, but also descriptive information such as titles, captions, primary keys, foreign keys, and hints to the user interface about how to display the field.
So in choosing a "data dictionary" over ORM I may be exhibiting confirmation bias, regardless here are my reasons for being weary of ORM:
I have never used ORM before, I don't know much about it.
This framework needs to be built rather quickly, my boss has little time and I need to produce a working application that will allow for a smooth upgrade to a more modern framework.
My boss already thinks that I am over engineering this framework (trust me, I am no where close to that) and is paranoid about the framework preventing us from being able to do things that we need to, and creating bugs that we can't solve in the required amount of time. So far I have done a poor job of convincing him that change is good, I am not a very effective salesman and while the other developers can help me the boss still needs a lot of assurance.
Our old framework is procedural, our code is PHP, and our developers know SQL very well. ORM would be a big change.
Our database has dozens of tables, many with hundreds of thousands of entries running on a fairly old server. In the past we have been burned by code that repeated polls the database in a loop instead of doing one query to pull all of the needed data at once. Avoiding this problem with hand coded SQL is rather straight forward. Ensuring that this always happens where necessary with ORM is a huge unknown to me and appears to be risky.
Regardless, the solution of the data dictionary seems very promising to me as this blog post "Using A Data Dictionary" seems to provide a lot of useful features and some that are requirements of the new framework. Here are my reasons for preferring the data dictionary solution:
Implementing access control rules on the table rows themselves would be invaluable.
Auto-generating database changes, documentation, and schema checking would also be useful.
One requirement of the framework is a generic data history/auditing feature that can be applied to any sub-feature within our application. A data dictionary or an equivalent is essentially required to provide such a feature. The history must have detailed information about the structure and data types within the database.
Our systems hold a wide variety of data types that would more properly addressed if they treated as formal types within the application. For one, HTML fragments (of which we have many in our data, they are required) need to be encoded as entities in some cases, decoded as HTML in others, parsed for links and images in some cases, and always validated for correctness. Then there are dates, measurements, currency, and various other fields that could benefit from having a clear definition in the code of how this data should be manipulated.
The data dictionary idea that I would like to implement would be series of objects in separate PHP files, and there will be plenty of OOP, however it will be used as in a manner very similar to the data dictionary concept presented in "The Database Programmer" blog. It would be the single source definition of the complete database schema for the entire framework.
Now my question is, am I overlooking the value of ORM or is this a case where a data dictionary is the right tool for the job?
I think your question would be more interesting if you were making an initial architectural decision rather than refactoring an existing application. I don't see a single assertion in your question that suggests a problem that designing in an ORM would address; but several it would create. If two major stakeholder groups (owner and other developers are more comfortable with a more conventional design, it seems to me that an ORM would be swimming upstream.
I can imagine the (possibly undeserved) approbation that would be associated with the ORM as soon as a query is slow or transaction locking problems start emerging. Not to mention the impact on the development schedule. Why create an unquantified risk factor?
Do you have a framework which supports building applications using a "Data Dictionary"? If so, give it a try, it might solve your problems. If you haven't, then there are lots of good and working ORM frameworks out there which have large communities, which come with source (so you can fix bugs yourself even if the "vendor" refuses to help you).
If you want to get a quick glance at a nice web based ORM framework, I suggest Django or TurboGears. They are based on Python which will be a nice change after using PHP. I usually prefer TurboGears but Django seems to be more smooth at the moment. Both are easy to set up and you should be able to build a prototype in a day or two. That will give you an idea of the odds.
PS: I also don't think ORM tools are TEH SOLUT10N. I use Hibernate or SQL Alchemy when it makes sense but I often roll my own simple mappers.
I think that you have made a very good analysis for you situation. You know why you choose the Data Dictionary approach. So go for it.
Later on you might reconsider. If so, then there should be not a problem to use the Data Dictionary and a ORM for new developments in parallel. Both technologies are not mutual exclusive.
If you don't like the idea of mixing different technologies: Stick to a solid OOP design and separate concerns between domain logic and data access cleanly, then switching to an ORM shouldn't be that painful or at least possible.
Good luck!

What is a good balance in an MVC model to have efficient data access?

I am working on a few PHP projects that use MVC frameworks, and while they all have different ways of retrieving objects from the database, it always seems that nothing beats writing your SQL queries by hand as far as speed and cutting down on the number of queries.
For example, one of my web projects (written by a junior developer) executes over 100 queries just to load the home page. The reason is that in one place, a method will load an object, but later on deeper in the code, it will load some other object(s) that are related to the first object.
This leads to the other part of the question which is what are people doing in situations where you have a table that in one part of the code only needs the values for a few columns, and another part needs something else? Right now (in the same project), there is one get() method for each object, and it does a "SELECT *" (or lists all the columns in the table explicitly) so that anytime you need the object for any reason, you get the whole thing.
So, in other words, you hear all the talk about how SELECT * is bad, but if you try to use a ORM class that comes with the framework, it wants to do just that usually. Are you stuck to choosing ORM with SELECT * vs writing the specific SQL queries by hand? It just seems to me that we're stuck between convenience and efficiency, and if I hand write the queries, if I add a column, I'm most likely going to have to add it to several places in the code.
Sorry for the long question, but I'm explaining the background to get some mindsets from other developers rather than maybe a specific solution. I know that we can always use something like Memcached, but I would rather optimize what we can before getting into that.
Thanks for any ideas.
First, assuming you are proficient at SQL and schema design, there are very few instances where any abstraction layer that removes you from the SQL statements will exceed the efficiency of writing the SQL by hand. More often than not, you will end up with suboptimal data access.
There's no excuse for 100 queries just to generate one web page.
Second, if you are using the Object Oriented features of PHP, you will have good abstractions for collections of objects, and the kinds of extended properties that map to SQL joins. But the important thing to keep in mind is to write the best abstracted objects you can, without regard to SQL strategies.
When I write PHP code this way, I always find that I'm able to map the data requirements for each web page to very few, very efficient SQL queries if my schema is proper and my classes are proper. And not only that, but my experience is that this is the simplest and fastest way to implement. Putting framework stuff in the middle between PHP classes and a good solid thin DAL (note: NOT embedded SQL or dbms calls) is the best example I can think of to illustrate the concept of "leaky abstractions".
I got a little lost with your question, but if you are looking for a way to do database access, you can do it couple of ways. Your MVC can use Zend framework that comes with database access abstractions, you can use that.
Also keep in mind that you should design your system well to ensure there is no contention in the database as your queries are all scattered across the php pages and may lock tables resulting in the overall web application deteriorating in performance and becoming slower over time.
That is why sometimes it is prefereable to use stored procedures as it is in one place and can be tuned when we need to, though other may argue that it is easier to debug if query statements are on the front-end.
No ORM framework will even get close to hand written SQL in terms of speed, although 100 queries seem unrealistic (and maybe you are exaggerating a bit) even if you have the creator of the ORM framework writing the code, it will always be far from the speed of good old SQL.
My advice is, look at the whole picture not only speed:
Does the framework improves code readability?
Is your team comfortable with writing SQL and mixing it with code?
Do you really understand how to optimize the framework queries? (I think a get() for each object is not the optimal way of retrieving them)
Do the queries (after optimization) of the framework present a bottleneck?
I've never developed anything with PHP, but I think that you could mix both approaches (ORM and plain SQL), maybe after a thorough profiling of the app you can determine the real bottlenecks and only then replace that ORM code for hand written SQL (Usually in ruby you use ActiveRecord, then you profile the application with something as new relic and finally if you have a complicated AR query you replace that for some SQL)
Regads
Trust your experience.
To not repeat yourself so much in the code you could write some simple model-functions with your own SQL. This is what I am doing all the time and I am happy with it.
Many of the "convenience" stuff was written for people who need magic because they cannot do it by hand or just don't have the experience.
And after all it's a question of style.
Don't hesitate to add your own layer or exchange or extend a given layer with your own stuff. Keep it clean and make a good design and some documentation so you feel home when you come back later.

Is LINQ an Object-Relational Mapper?

Is LINQ a kind of Object-Relational Mapper?
LINQ in itself is a set of language extensions to aid querying, readability and reduce code. LINQ to SQL is a kind of OR Mapper, but it isn't particularly powerful. The Entity Framework is often referred to as an OR Mapper, but it does quite a lot more.
There are several other LINQ to X implementations around, including LINQ to NHibernate and LINQ to LLBLGenPro that offer OR Mapping and supporting frameworks in a broadly similar fashion to the Entity Framework.
If you are just learning LINQ though, I'd recommend you stick to LINQ to Objects to get a feel for it, rather than diving into one of the more complicated flavours :-)
LINQ is not an ORM at all. LINQ is a way of querying "stuff", and can be more or less seen as a SQL-like language extension for different things (IEnumerables).
There are various types of "stuff" that can be queried, among them SQL Server databases. This is called LINQ-to-SQL. The way it works is that it generates (implicit) classes based on the structure of the DB and your query. In this sense it works much more like a code generator.
LINQ-to-SQL is not an ORM because it doesn't try at all to solve the object-relational impedance mismatch. In an ORM you design the classes and then either map them manually to tables or let the ORM generate the database. If you then change the database for whatever reason (typically refactoring, renormalization, denormalization), many times you are able to keep the classes as they are by changing the mapping.
LINQ-to-SQL does nothing of the sort. Your LINQ queries will be tightly coupled to the database structure. If you change the DB, you will probably have to change the LINQ as well.
LINQ to SQL (part of Visual Studio 2008) is an OR Mapper.
LINQ is a new query language that can be used to query many different types of sources.
LINQ itself is not a ORM. LINQ is the language features and methods that exist in allowing you to query objects like SQL.
"LINQ to SQL" is a provider that allows us to use LINQ against SQL strongly-typed objects.
I think a good test to ascertain whether a platform or code block displays the characteristics of an O/R-M is simply:
With his solution hat on, does the developer(s) (or his/her code generator) have any direct, unabstracted knowledge of what's inside the database?
With this criterion, the answer for differing LINQ implementations can be
Yes, knowledge of the database schema is entirely contained within the roll-your-own, LINQ utilizing O/R-M code layerorNo, knowledge of the database schema is scattered throughout the application
Further, I'd extend this characterization to three simple levels of O/R-M.
1. Abandonment.
It's a small app w/ a couple of developers and the object/data model isn't that complex and doesn't change very often. The small dev team can stay on top of it.
2. Roll your own in the data access layer.
With some managable refactoring in a data access layer, the desired O/R-M functionality can be effected in an intermediate layer by the relatively small dev team. Enough to keep the entire team on the same page.
3. Enterprise-level O/R-M specification defining/overhead introducing tools.
At some level of complexity, the need to keep all devs on the same page just swamps any overhead introduced by the formality. No need to reinvent the wheel at this level of complexity. N-hibernate or the (rough) V1.0 Entity Framework are examples of this scale.
For a richer classification, from which I borrowed and simplified, see Ted Neward's classic post at
http://blogs.tedneward.com/2006/06/26/The+Vietnam+Of+Computer+Science.aspx
where he classifies O/R-M treatments (or abdications) as
1. Abandonment. Developers simply give up on objects entirely, and return to a programming model that doesn't create the object/relational impedance mismatch. While distasteful, in certain scenarios an object-oriented approach creates more overhead than it saves, and the ROI simply isn't there to justify the cost of creating a rich domain model. ([Fowler] talks about this to some depth.) This eliminates the problem quite neatly, because if there are no objects, there is no impedance mismatch.
2. Wholehearted acceptance. Developers simply give up on relational storage entirely, and use a storage model that fits the way their languages of choice look at the world. Object-storage systems, such as the db4o project, solve the problem neatly by storing objects directly to disk, eliminating many (but not all) of the aforementioned issues; there is no "second schema", for example, because the only schema used is that of the object definitions themselves. While many DBAs will faint dead away at the thought, in an increasingly service-oriented world, which eschews the idea of direct data access but instead requires all access go through the service gateway thus encapsulating the storage mechanism away from prying eyes, it becomes entirely feasible to imagine developers storing data in a form that's much easier for them to use, rather than DBAs.
3. Manual mapping. Developers simply accept that it's not such a hard problem to solve manually after all, and write straight relational-access code to return relations to the language, access the tuples, and populate objects as necessary. In many cases, this code might even be automatically generated by a tool examining database metadata, eliminating some of the principal criticism of this approach (that being, "It's too much code to write and maintain").
4. Acceptance of O/R-M limitations. Developers simply accept that there is no way to efficiently and easily close the loop on the O/R mismatch, and use an O/R-M to solve 80% (or 50% or 95%, or whatever percentage seems appropriate) of the problem and make use of SQL and relational-based access (such as "raw" JDBC or ADO.NET) to carry them past those areas where an O/R-M would create problems. Doing so carries its own fair share of risks, however, as developers using an O/R-M must be aware of any caching the O/R-M solution does within it, because the "raw" relational access will clearly not be able to take advantage of that caching layer.
5. Integration of relational concepts into the languages. Developers simply accept that this is a problem that should be solved by the language, not by a library or framework. For the last decade or more, the emphasis on solutions to the O/R problem have focused on trying to bring objects closer to the database, so that developers can focus exclusively on programming in a single paradigm (that paradigm being, of course, objects). Over the last several years, however, interest in "scripting" languages with far stronger set and list support, like Ruby, has sparked the idea that perhaps another solution is appropriate: bring relational concepts (which, at heart, are set-based) into mainstream programming languages, making it easier to bridge the gap between "sets" and "objects". Work in this space has thus far been limited, constrained mostly to research projects and/or "fringe" languages, but several interesting efforts are gaining visibility within the community, such as functional/object hybrid languages like Scala or F#, as well as direct integration into traditional O-O languages, such as the LINQ project from Microsoft for C# and Visual Basic. One such effort that failed, unfortunately, was the SQL/J strategy; even there, the approach was limited, not seeking to incorporate sets into Java, but simply allow for embedded SQL calls to be preprocessed and translated into JDBC code by a translator.
6. Integration of relational concepts into frameworks. Developers simply accept that this problem is solvable, but only with a change of perspective. Instead of relying on language or library designers to solve this problem, developers take a different view of "objects" that is more relational in nature, building domain frameworks that are more directly built around relational constructs. For example, instead of creating a Person class that holds its instance data directly in fields inside the object, developers create a Person class that holds its instance data in a RowSet (Java) or DataSet (C#) instance, which can be assembled with other RowSets/DataSets into an easy-to-ship block of data for update against the database, or unpacked from the database into the individual objects.
Linq To SQL using the dbml designer yes, otherwise Linq is just a set of extension methods for Enumerables.

Resources