Is ORM fit for complex projects? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've not started the ORM trip yet,
because I'm not sure how it works when the project becomes very complex.
What's your opinion or experience?

This question is a difficult one to answer sometimes. You may have heard of the Object-Relational Impedance Mismatch before; that is the issue that ORM tools attempt to solve, but it is fraught with problems. It is one of those situations where you can solve 90% of the problem in a very short time, but every additional 1% from there on up seems to increase exponentially in complexity because of all the dependencies.
An ORM framework is an abstraction, and becomes a leaky abstraction at several points:
Complex queries/scripts involving concepts like UDTs, CTEs, query hints, temporary tables, windowing functions, etc.
Performance-optimized queries. As Quassnoi mentioned in his answer, ORMs are getting better at this but frequently generate sub-optimal queries, and sometimes the effect is extremely noticeable.
Transaction management - the Unit-of-Work pattern can only get you so far when you have to deal with large batch updates.
Cross-database or cross-server actions. There are workarounds, but they are just that - workarounds. No ORM I've seen really handles this well.
Multiple-table inheritance - this is the only form of inheritance that is actually normalized, and it is really not that hard to manage using pure SQL and manual mapping, but O/R mappers are lousy at it. For many of us, single-table inheritance is not an acceptable alternative.
Those are some of many areas where O/R mappers seem to fail us. Having said that, this does not mean that O/R Mappers are not "fit" for large projects.
In my opinion, ORM in general and O/R Mappers specifically are almost vital for large projects. They save enormous amounts of effort and can help you get an application out the door in a fraction of the time it would have taken you otherwise. They just do not solve the whole problem. You have to be prepared to profile your application to see what the ORM is really doing, and you have to be prepared to drop back down to pure SQL when the situation calls for it (i.e. in several of the situations above).
Some frameworks, such as Linq to SQL, expect you to do this and give you ready-made facilities for executing commands or stored procedures on the same connection and in the same transaction used for the mapper's "regular" duties. L2S is not the only framework that lets you do this, but several are more restrictive, and you end up jumping through many hoops to get what you need. When choosing an ORM, I think that the ability to bypass the abstraction is an important consideration, at least today.
I think the best answer to this question is: Yes, they are fit for large projects, as long as you do not rely exclusively on them. Know the limits of your ORM tool of choice, use it as a time-saver in the 90% of instances when you can, and make sure you and your team understand what's really going on under the hood for those instances when the abstraction leaks.

ORM. by defintion, is object-relational mapping.
This means that you should transform the data stored in a relational database into the objects usable by an object-oriented programming language.
The objects may supply some methods that may involve data processing and searching for the other objects.
This is where the problems begin.
The data processing may be implemented on the ORM side (which means loading the data from the database, applying the object wraparound and implementing the methods on the programming language you use), or on the database side (when the data processing commands are issued as a query to the database).
Compare this:
MyAccount->Transactions()->GetLast()
This can be implemented in two ways:
SELECT *
FROM Accounts
WHERE user_id = #me
into $MyAccount
, then
SELECT *
FROM Transactions
WHERE account_id = #myaccount
into a client-side array #Transactions
, then $Transactions[-1] to get the last.
This is inefficient way, and you'll notice it as you get more data.
Alternatively, a smart ORM can convert it into this:
SELECT TOP 1 Transactions.*
FROM Accounts
JOIN Transactions
ON Transactions.Account = Accounts.id
WHERE Accounts.UserID = #me
ORDER BY
Transactions.Date DESC
, but it has to be a really smart ORM.
So the answer to the question "whether to use an ORM or not" is the answer to the question "will my ORM allow me to issue set-based operations to the database should the need arise"?

This is subjective. My answer is specifically about automated ORM Tools.
I have a philosophical objection to ORM Tools for the following reasons:
1- A table is not and should not necessarily be a one-to-one mapping to a business object.
2- Base CRUD/Business Object code is boring to write, but it's critical to your application. I'd rather be in control and have knowledge of it. (a little NIH syndrome)
3- A new developer coming in is going to have an easier time learning a traditional object model versus whatever bizarre syntax is created by the ORM tool.

You don't mention what platform you are using, but if I wanted to read a record from a database in .NET without using an ORM, I would have to:
Read a Connection String
Open a Database Connection
Open a Command object against the connection
Read my Record (by execute a SQL statement against the Command object)
Transform that Record to an object
in my language of choice
Close my query
Close my connection
Sound complicated? An ORM does all of the same things automatically under the covers, and I only need a few lines of code. In addition, because the ORM has knowledge of your data model, it can sometimes perform optimizations such as caching and lazy loading.

When project becomes more complex it is even better, because it let's keep everything at the same level of abstraction, rather than jumping from objects to SQL. We once have written our own layer in paralell to developing the application (because we couldn't use any traditional ORM), and the more powerful it became, the easier managing application become.
Performance concerns are you usually overrated. It's usually in different place, then you would expect. We had some badass abstraction layer written in Python, and it working great. What sucked, was url library, which we had to rewrite in C. Really, you can always optimize queries, that are most important at the end, writing SQL by hand at the moment, when you see, that performance needs it. But in most times - you won't have to.

In my opinion ORM built for big projects, to minimize effort and development time.
But if you are developing an application which needs very high speed data access code, you need to avoid ORMs as you can because ORMs add a new layer in your application

ORMs are ideal for large projects because they provide a layer of protection from changes on the database side, and speed up the process of adding new features. If performance becomes an issue, you can use a different method to get to your data at the query where you encounter a bottleneck, rather than hand-optimizing every query in the application.

ORM's are cool if you want to pump out web app's quickly to do customer development and see if people actually use your product.
http://www.youtube.com/watch?v=uFLRc6y_O3s
The very last topic that Josh Berkus discusses is "Runaway ORM's". Check it out at 37:20.

Related

ORM vs traditional database query, which are their fields?

ORM seems to be a fast-growing model, with both pros and cons in their side. From Ultra-Fast ASP.NET of Richard Kiessig (http://www.amazon.com/Ultra-Fast-ASP-NET-Build-Ultra-Scalable-Server/dp/1430223839/ref=pd_bxgy_b_text_b):
"I love them because they allow me to develop small, proof-of-concept sites extremely quickly. I can side step much of the SQL and related complexity that I would otherwise need and focus on the objects, business logic and presentation. However, at the same time, I also don't care for them because, unfortunately, their performance and scalability is usually very poor, even when they're integrated with a comprehensive caching system (the reason for that becomes clear when you realize that when properly configured, SQL Server itself is really just a big data cache"
My questions are:
What is your comment about Richard's idea. Do you agree with him or not? If not, please tell why.
What is the best suitable fields for ORM and traditional database query? in other words, where you should use ORM and where you should use traditional database query :), which kind/size... of applications you should undoubtedly choose ORM/traditional database query
Thanks in advance
I can't agree to the common complain about ORMs that they perform bad. I've seen many plain-SQL applications until now. While it is theoretically possible to write optimized SQL, in reality, they ruin all the performance gain by writing not optimized business logic.
When using plain SQL, the business logic gets highly coupled to the db model and database operations and optimizations are up to the business logic. Because there is no oo model, you can't pass around whole object structures. I've seen many applications which pass around primary keys and retrieve the data from the database on each layer again and again. I've seen applications which access the database in loops. And so on. The problem is: because the business logic is already hardly maintainable, there is no space for any more optimizations. Often when you try to reuse at least some of your code, you accept that it is not optimized for each case. The performance gets bad by design.
An ORM usually doesn't require the business logic to care too much about data access. Some optimizations are implemented in the ORM. There are caches and the ability for batches. This automatic (and runtime-dynamic) optimizations are not perfect, but they decouple the business logic from it. For instance, if a piece of data is conditionally used, it loads it using lazy loading on request (exactly once). You don't need anything to do to make this happen.
On the other hand, ORM's have a steep learning curve. I wouldn't use an ORM for trivial applications, unless the ORM is already in use by the same team.
Another disadvantage of the ORM is (actually not of the ORM itself but of the fact that you'll work with a relational database an and object model), that the team needs to be strong in both worlds, the relational as well as the oo.
Conclusion:
ORMs are powerful for business-logic centric applications with data structures that are complex enough that having an OO model will advantageous.
ORMs have usually a (somehow) steep learning curve. For small applications, it could get too expensive.
Applications based on simple data structures, having not much logic to manage it, are most probably easier and straight forward to be written in plain sql.
Teams with a high level of database knowledge and not much experience in oo technologies will most probably be more efficient by using plain sql. (Of course, depending on the applications they write it could be recommendable for the team to switch the focus)
Teams with a high level of oo knowledge and only basic database experience are most probably more efficient by using an ORM. (same here, depending on the applications they write it could be recommendable for the team to switch the focus)
ORM is pretty old, at least in the Java world.
Major problems with ORM:
Object-Oriented model and Relational model are quite different.
SQL is a high level language to access data based on relational algebra, different from any OO language like C#, Java or Visual Basic.Net. Mixing those can you the worst of two worlds, instead of the best
For more information search the web on things like 'Object-relational impedance mismatch'
Either case, a good ORM framework saves you on quite some boiler-plate code. But you still need to have knowlegde of SQL, how to setup a good SQL databasemodel. Start with creating a good databasemodel using SQL, then base your OO model on that (not the other way around)
However, the above only holds if you really need to use a SQL database. I recommend looking into NoSQL movement as well. There's stuff like Cassandra, Couch-db. While google'ing for .net solutions I found this stackoverflow question: https://stackoverflow.com/questions/1777103/what-nosql-solutions-are-out-there-for-net
I'm the author of the book with the text quoted in the question.
Let me emphatically add that I am not arguing against using business objects or object oriented programming.
One issue I have with conventional ORM -- for example, LINQ to SQL or Entity Framework -- is that it often leads to developers making DB calls when they don't even realize that they're doing so. This, in turn, is a performance and scalability killer.
I review lots of websites for performance issues, and have found that DB chattiness is one of the most common causes of serious problems. Unfortunately, ORM tends to encourage chattiness, in spades.
The other complaints I have about ORM include:
No support for command batching
No support for multiple result sets
No support for table valued parameters
No support for native async calls (making them from a background thread doesn't count)
Support for SqlDependency and SqlCacheDependency is klunky if/when it works at all
I have no objection to using ORM tactically, to address specific business issues. But I do object to using it haphazardly, to the point where developers do things like make the exact same DB call dozens of time on the same page, or issue hugely expensive queries without considering caching and change notifications, or totally neglect async operations when scalability is a concern.
This site uses Linq-to-SQL I believe, and it's 'fairly' high traffic... I think that the time you save from writing the boiler plate code to access/insert/update simple items is invaluable, but there is always the option to drop down to calling a SPROC if you have something more complex, where you know you can write some screaming fast SQL directly.
I don't think that these things have to be mutually exclusive - use the advantages of both, and if there are sections of your application that start to slow down, then you can optimise as you need to.
ORM is far older than both Java and .NET. The first one I knew about was TopLink for Smalltalk. It's an idea as old as persistent objects.
Every "CRUD on the web" framework like Ruby on Rails, Grails, Django, etc. uses ORM for persistence because they all presume that you are starting with a clean sheet object model: no legacy schema to bother with. You start with the objects to model your problem and generate the persistence from it.
It often works the other way with legacy systems: the schema is long-lived, and you may or may not have objects.
It's astonishing how quickly you can get a prototype up and running with "CRUD on the web" frameworks, but I don't see them being used to develop enterprise apps in large corporations. Maybe that's a Fortune 500 prejudice.
Database admins that I know tell me they don't like the SQL that ORMs generate because it's often inefficient. They all wish for a way to hand-tune it.
I agree with most points already made here.
ORM's are not new in .NET, LLBLGen has been around for a long time, I've been using them for >5 years now in .NET.
I've seen very bad performing code written without ORMs (in-efficient SQL queries, bad indexes, nested database calls - ouch!) and bad code written with ORMs - I'm sure I've contributed to some of the bad code too :)
What I would add is that an ORM is generally a powerful and productivity-enhancing tool that allows you to stop worrying about plumbing db code for most of your application and concentrate on the application itself. When you start trying to write complex code (for example reporting pages or complex UI's) you need to understand what is happening underneath the hood - ignorance can be very costly. But, used properly, they are immensely powerful, and IMO won't have a detrimental effect on your apps performance. I for one wouldn't be happy on a project that didn't use an ORM.
Programming is about writing software for business use. The more we can focus on business logic and presentation and less with technicalities that only matter at certain points in time (when software goes down, when software needs upgrading, etc), the better.
Recently I read about talks of scalability from a Reddit founder, from here, and one line of him that caught my attention was this:
"Having to deal with the complexities
of relational databases (relations,
joins, constraints) is a thing of the
past."
From what I have watched, maintaining a complex database schema, when it comes to scalability, becomes a major pain as the site grows (you add a field, you reassign constraints, re-map foreign keys...etc). It was not entirely clear to me as to why is that. They're not using a NOSQL database though, they're in Postgres.
Add to that, here comes ORM, another layer of abstraction. It simplifies code writing, but almost often at a performance penalty. For me, a simple database abstraction library will do, much like lightweight AR libs out there together with database-specific "plain text" queries. I can't show you any benchmark but with the ORMs I have seen, most of them say that "ORM can often be slow".
Richard covers both sides of the coin, so I agree with him.
As for the fields, I really don't quite get the context of the "fields" you are asking about.
As others have said, you can write underperforming ORM code, and you can also write underperforming SQL.
Using ORM doesn't excuse you from knowing your SQL, and understanding how a query fits together. If you can optimize a SQL query, you can usually optimize an ORM query. For example, hibernate's criteria and HQL queries let you control which associations are joined to improve performance and avoid additional select statements. Knowing how to create an index to improve your most common query can make or break your application performance.
What ORM buys you is uniform, maintainable database access. They provide an extra layer of verification to ensure that your OO code matches up as closely as possible with your database access, and prevent you from making certain classes of stupid mistake, like writing code that's vulnerable to SQL injection. Of course, you can parameterize your own queries, but ORM buys you that advantage without having to think about it.
Never got anything but pain and frustration from ORM packages. If I'd write my SQL the way they autogen it - yeah I'd claim to be fast while my code would be slow :-) Have you ever seen SQL generated by an ORM ? Barely has PK-s, uses FK-s only for misguided interpretation of "inheritance" and if it wants to do paging it dumps the whole recordset on you and then discards 90% of it :-))) Then it locks everything in sight since it has to take in a load of records like it went back to 50 yr old IBM's batch processing.
For a while I thought that the biggest problem with ORM was splintering (not going to have a standard in 50 yrs - every year different API, pardon "model" :-) and ideologizing (everyone selling you a big philosophy - always better than everyone else's of course :-) Then I realized that it was really the total amateurism that's the root cause of the mess and everything else is just the consequence.
Then it all started to make sense. ORM was never meant to be performant or reliable - that wasn't even on the list :-) It was academic, "conceptual" toy from the day one, the consolation prize for professors pissed off that all their "relational" research papers in Prolog went down the drain when IBM and Oracle started selling that terrible SQL thing and making a buck :-)
The closest I came to trusting one was LINQ but only because it's possible and quite easy to kick out all "tracking" and use is just as deserialization layer for normal SQL code. Then I read how the object that's managing connection can develop spontaneous failures that sounded like premature GC while it still had some dangling stuff around. No way I was going to risk my neck with it after that - nope, not my head :-)
So, let me make a list:
Totally sloppy code - not going to suffer bugs and poor perf
Not going to take deadlocks from ORM's 10-100 times longer "transactions"
Drastic reduction of capabilities - SQL has huge expressive power these days
Tying you up into fringe and sloppy API (every ORM aims to hijack your codebase)
SQL queries are highly portable and SQL knowledge is totally portable
I still have to know SQL just to clean up ORM's mess anyway
For "proof-of-concept" I can just serialize to binary or XML files
not much slower, zero bug libraries and one XPath can select better anyway
I've actually done heavy traffic web sites all from XML files
if I actually need real graph then I have no use for DB - nothing real to query
I can serialize a blob and dump into SQL in like 3 lines of code
If someone claims that he does it all from DB to UI - keep your codebase locked :-)
and backup your payroll DB - you'll thank me latter :-)))
NoSQL bases are more honest than ORM - "we specialize in persistence"
and have better code quality - not surprised at all
That would be the short list :-) BTW, modern SQL engines these days do trees and spatial indexing, not to mention paging without a single record wasted. ORM-s are actually "solving" problems of 10yrs ago and promoting amateurism. To that extent NoSQL, also known as document

What's the meaning of ORM?

I ever developed several projects based on python framework Django. And it greatly improved my production. But when the project was released and there are more and more visitors the db becomes the bottleneck of the performance.
I try to address the issue, and find that it's ORM(django) to make it become so slow. Why? Because Django have to serve a uniform interface for the programmer no matter what db backend you are using. So it definitely sacrifice some db's performance(make one raw sql to several sqls and never use the db-specific operation).
I'm wondering the ORM is definitely useful and it can:
Offer a uniform OO interface for the progarammers
Make the db backend migration much easier (from mysql to sql server or others)
Improve the robust of the code(using ORM means less code, and less code means less error)
But if I don't have the requirement of migration, What's the meaning of the ORM to me?
ps. Recently my friend told me that what he is doing now is just rewriting the ORM code to the raw sql to get a better performance. what a pity!
So what's the real meaning of ORM except what I mentioned above?
(Please correct me if I made a mistake. Thanks.)
You have mostly answered your own question when you listed the benefits of an ORM. There are definitely some optimisation issues that you will encounter but the abstraction of the database interface probably over-rides these downsides.
You mention that the ORM sometimes uses many sql statements where it could use only one. You may want to look at "eager loading", if this is supported by your ORM. This tells the ORM to fetch the data from related models at the same time as it fetches data from another model. This should result in more performant sql.
I would suggest that you stick with your ORM and optimise the parts that need it, but, explore any methods within the ORM that allow you to increase performance before reverting to writing SQL to do the access.
A good ORM allows you to tune the data access if you discover that certain queries are a bottleneck.
But the fact that you might need to do this does not in any way remove the value of the ORM approach, because it rapidly gets you to the point where you can discover where the bottlenecks are. It is rarely the case that every line of code needs the same amount of careful hand-optimisation. Most of it won't. Only a few hotspots require attention.
If you write all the SQL by hand, you are "micro optimising" across the whole product, including the parts that don't need it. So you're mostly wasting effort.
here is the definition from Wikipedia
Object-relational mapping is a programming technique for converting data between incompatible type systems in relational databases and object-oriented programming languages. This creates, in effect, a "virtual object database" that can be used from within the programming language.
a good ORM (like Django's) makes it much faster to develop and evolve your application; it lets you assume you have available all related data without having to factor every use in your hand-tuned queries.
but a simple one (like Django's) doesn't relieve you from good old DB design. if you're seeing DB bottleneck with less than several hundred simultaneous users, you have serious problems. Either your DB isn't well tuned (typically you're missing some indexes), or it doesn't appropriately represents the data design (if you need many different queries for every page this is your problem).
So, i wouldn't ditch the ORM unless you're twitter or flickr. First do all the usual DB analysis: You see a lot of full-table scans? add appropriate indexes. Lots of queries per page? rethink your tables. Every user needs lots of statistics? precalculate them in a batch job and serve from there.
ORM separates you from having to write that pesky SQL.
It's also helpful for when you (never) port your software to another database engine.
On the downside: you lose performance, which you fix by writing a custom flavor of SQL - that it tried to insulate from having to write in the first place.
ORM generates sql queries for you and then return as object to you. that's why it slower than if you access to database directly. But i think it slow a little bit ... i recommend you to tune your database. may be you need to check about index of table etc.
Oracle for example, need to be tuned if you need to get faster ( i don't know why, but my db admin did that and it works faster with queries that involved with lots of data).
I have recommendation, if you need to do complex query (eg: reports) other than (Create Update Delete/CRUD) and if your application won't use another database, you should use direct sql (I think Django has it feature)

Why should you use an ORM? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
If you are motivate to the "pros" of an ORM and why would you use an ORM to management/client, what are those reasons would be?
Try and keep one reason per answer so that we can see which one gets voted up as the best reason.
The most important reason to use an ORM is so that you can have a rich, object oriented business model and still be able to store it and write effective queries quickly against a relational database. From my viewpoint, I don't see any real advantages that a good ORM gives you when compared with other generated DAL's other than the advanced types of queries you can write.
One type of query I am thinking of is a polymorphic query. A simple ORM query might select all shapes in your database. You get a collection of shapes back. But each instance is a square, circle or rectangle according to its discriminator.
Another type of query would be one that eagerly fetches an object and one or more related objects or collections in a single database call. e.g. Each shape object is returned with its vertex and side collections populated.
I'm sorry to disagree with so many others here, but I don't think that code generation is a good enough reason by itself to go with an ORM. You can write or find many good DAL templates for code generators that do not have the conceptual or performance overhead that ORM's do.
Or, if you think that you don't need to know how to write good SQL to use an ORM, again, I disagree. It might be true that from the perspective of writing single queries, relying on an ORM is easier. But, with ORM's it is far too easy to create poor performing routines when developers don't understand how their queries work with the ORM and the SQL they translate into.
Having a data layer that works against multiple databases can be a benefit. It's not one that I have had to rely on that often though.
In the end, I have to reiterate that in my experience, if you are not using the more advanced query features of your ORM, there are other options that solve the remaining problems with less learning and fewer CPU cycles.
Oh yeah, some developers do find working with ORM's to be fun so ORM's are also good from the keep-your-developers-happy perspective. =)
Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.
Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.
Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.
Generating boilerplate code for basic CRUD operations. Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.
You can move to different database software easily because you are developing to an abstraction.
Development happiness, IMO. ORM abstracts away a lot of the bare-metal stuff you have to do in SQL. It keeps your code base simple: fewer source files to manage and schema changes don't require hours of upkeep.
I'm currently using an ORM and it has sped up my development.
So that your object model and persistence model match.
To minimise duplication of simple SQL queries.
The reason I'm looking into it is to avoid the generated code from VS2005's DAL tools (schema mapping, TableAdapters).
The DAL/BLL i created over a year ago was working fine (for what I had built it for) until someone else started using it to take advantage of some of the generated functions (which I had no idea were there)
It looks like it will provide a much more intuitive and cleaner solution than the DAL/BLL solution from http://wwww.asp.net
I was thinking about created my own SQL Command C# DAL code generator, but the ORM looks like a more elegant solution
Abstract the sql away 95% of the time so not everyone on the team needs to know how to write super efficient database specific queries.
I think there are a lot of good points here (portability, ease of development/maintenance, focus on OO business modeling etc), but when trying to convince your client or management, it all boils down to how much money you will save by using an ORM.
Do some estimations for typical tasks (or even larger projects that might be coming up) and you'll (hopefully!) get a few arguments for switching that are hard to ignore.
Compilation and testing of queries.
As the tooling for ORM's improves, it is easier to determine the correctness of your queries faster through compile time errors and tests.
Compiling your queries helps helps developers find errors faster. Right? Right. This compilation is made possible because developers are now writing queries in code using their business objects or models instead of just strings of SQL or SQL like statements.
If using the correct data access patterns in .NET it is easy to unit test your query logic against in memory collections. This speeds the execution of your tests because you don't need to access the database, set up data in the database or even spin up a full blown data context.[EDIT]This isn't as true as I thought it was as unit testing in memory can present difficult challenges to overcome. But I still find these integration tests easier to write than in previous years.[/EDIT]
This is definitely more relevant today than a few years ago when the question was asked, but that may only be the case for Visual Studio and Entity Framework where my experience lies. Plugin your own environment if possible.
.net tiers using code smith templates
http://nettiers.com/default.aspx?AspxAutoDetectCookieSupport=1
Why code something that can be generated just as well.
convince them how much time / money you will save when changes come in and you don't have to rewrite your SQL since the ORM tool will do that for you
I think one cons is that ORM will need some updation in your POJO. mainly related to schema, relation and query. so scenario where you are not suppose to make changes in model objects, might be because it is shared among more that on project or b/w client and server. so in such cases you will need to split it in two levels, which will require additional efforts .
i am an android developer and as you know mobile apps are usually not huge in size, so this additional effort to segregate pure-model and orm-affected-model does not seems worth full.
i understand that question is generic one. but mobile apps are also come inside generic umbrella.

What is a good balance in an MVC model to have efficient data access?

I am working on a few PHP projects that use MVC frameworks, and while they all have different ways of retrieving objects from the database, it always seems that nothing beats writing your SQL queries by hand as far as speed and cutting down on the number of queries.
For example, one of my web projects (written by a junior developer) executes over 100 queries just to load the home page. The reason is that in one place, a method will load an object, but later on deeper in the code, it will load some other object(s) that are related to the first object.
This leads to the other part of the question which is what are people doing in situations where you have a table that in one part of the code only needs the values for a few columns, and another part needs something else? Right now (in the same project), there is one get() method for each object, and it does a "SELECT *" (or lists all the columns in the table explicitly) so that anytime you need the object for any reason, you get the whole thing.
So, in other words, you hear all the talk about how SELECT * is bad, but if you try to use a ORM class that comes with the framework, it wants to do just that usually. Are you stuck to choosing ORM with SELECT * vs writing the specific SQL queries by hand? It just seems to me that we're stuck between convenience and efficiency, and if I hand write the queries, if I add a column, I'm most likely going to have to add it to several places in the code.
Sorry for the long question, but I'm explaining the background to get some mindsets from other developers rather than maybe a specific solution. I know that we can always use something like Memcached, but I would rather optimize what we can before getting into that.
Thanks for any ideas.
First, assuming you are proficient at SQL and schema design, there are very few instances where any abstraction layer that removes you from the SQL statements will exceed the efficiency of writing the SQL by hand. More often than not, you will end up with suboptimal data access.
There's no excuse for 100 queries just to generate one web page.
Second, if you are using the Object Oriented features of PHP, you will have good abstractions for collections of objects, and the kinds of extended properties that map to SQL joins. But the important thing to keep in mind is to write the best abstracted objects you can, without regard to SQL strategies.
When I write PHP code this way, I always find that I'm able to map the data requirements for each web page to very few, very efficient SQL queries if my schema is proper and my classes are proper. And not only that, but my experience is that this is the simplest and fastest way to implement. Putting framework stuff in the middle between PHP classes and a good solid thin DAL (note: NOT embedded SQL or dbms calls) is the best example I can think of to illustrate the concept of "leaky abstractions".
I got a little lost with your question, but if you are looking for a way to do database access, you can do it couple of ways. Your MVC can use Zend framework that comes with database access abstractions, you can use that.
Also keep in mind that you should design your system well to ensure there is no contention in the database as your queries are all scattered across the php pages and may lock tables resulting in the overall web application deteriorating in performance and becoming slower over time.
That is why sometimes it is prefereable to use stored procedures as it is in one place and can be tuned when we need to, though other may argue that it is easier to debug if query statements are on the front-end.
No ORM framework will even get close to hand written SQL in terms of speed, although 100 queries seem unrealistic (and maybe you are exaggerating a bit) even if you have the creator of the ORM framework writing the code, it will always be far from the speed of good old SQL.
My advice is, look at the whole picture not only speed:
Does the framework improves code readability?
Is your team comfortable with writing SQL and mixing it with code?
Do you really understand how to optimize the framework queries? (I think a get() for each object is not the optimal way of retrieving them)
Do the queries (after optimization) of the framework present a bottleneck?
I've never developed anything with PHP, but I think that you could mix both approaches (ORM and plain SQL), maybe after a thorough profiling of the app you can determine the real bottlenecks and only then replace that ORM code for hand written SQL (Usually in ruby you use ActiveRecord, then you profile the application with something as new relic and finally if you have a complicated AR query you replace that for some SQL)
Regads
Trust your experience.
To not repeat yourself so much in the code you could write some simple model-functions with your own SQL. This is what I am doing all the time and I am happy with it.
Many of the "convenience" stuff was written for people who need magic because they cannot do it by hand or just don't have the experience.
And after all it's a question of style.
Don't hesitate to add your own layer or exchange or extend a given layer with your own stuff. Keep it clean and make a good design and some documentation so you feel home when you come back later.

How much of an applications "smarts" should reside in the database? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I've noticed a trend lately that people are moving more and more processing out of databases and in to applications. Some people are taking this to what seems to me to be ridiculous extremes.
I've seen application designs that not only banned all use of stored procedures, but also banned any kind of constraints enforced at the database (this would include primary key, foreign key, unique, and check constraints). I have even seen applications that required the use of only one data type stored in the database, namely varchar(2000). DateTime and number types were not allowed. Transactions and concurrency were also handled outside the database.
Has anyone seen these kind of applications implemented successfully? Both of the implementations I've dealt with that were implemented this way had all kinds of data integrity and concurrency problems. Can anyone explain this trend to move stuff (logic, processing, constraints) out of the database? What is the motivation behind it? Is it something I'm imagining?
Firstly, I really hope there is no trend towards databases without PKs and FKs and sensible datatypes. That would really be a tragedy.
But there is definitely a large core of developers who prefer putting logic in their apps than in stored procedures. I agree with Riho on the main reason for this: usually, DBAs manage databases, meaning that a developer has to go through a bunch of administrative overhead -- getting approvals from the DBA -- in order to create and update stored procs. Programmers by nature like to have control over their world, and to do things "their way."
There are also a couple of valid technical reasons:
Procedural extensions to SQL (e.g. T-SQL) used for developing stored procs have traditionally lacked user-defined datatypes, debuggability, portability, and interoperability with external systems -- all qualities helpful for developing reliable large-scale software. (And the clumsy syntax doesn't help.)
Software version control (e.g. svn) works well for managing even very large codebases, but managing DB schema changes is a harder problem and less well supported. Every serious shop uses version control for their application codebase, but many still don't have any rigorous system for managing their databases; hence stored procs can easily fall into an unversioned "black hole" that makes coders rightly nervous.
My personal view is that the closer the core business logic is to the data, the better, especially if more than one agent accesses the DB (or may do in the future). It's an unfortunate artefact of history that T-SQL and its ilk were weak languages, leading to the rise of the idea that "data and logic should be separated." My ideal world is one in which every business rule is encapsulated in a constraint enforced by the database, and all inconsistencies fail fast.
I like to keep logic out of the database. I tend to avoid stored procedures and triggers. I do, though, always use proper data types, keys, indicies and constraints. The way I see it is that the database is a database and the application is the application. The database should keep your data stored properly and efficiently whereas the application should own the logic. Perhaps I have never been in a situation where a stored procedure or trigger was needed; and thus never been inclined to use them to solve a problem. But to me, giving logic a home on the database seems "messy" to me; I would rather control everything from the application itself.
The trend results from the fact that the software technology industry is populated and driven largely by humans, and thus subject to trends and irrational behavior. To understand what's going on today requires a bit of perspective in the history of databases, and their parallel development with programming languages.
To be brief in this answer that will likely get downvoted: SQL is the IE6 of the database languages world. It breaks many of the rules of the relational model- in other words, it's a little bit like a calculator that performs multiplication incorrectly, and doesn't have a minus operator. SQL is not complete enough to be a real solution. It was never developed beyond the prototype stage, and was never meant to be used in industrial settings. But then it was naively used by oracle, which turned out to be a "killer app", SQL became industry standard instead of its technically superior competitors, and the rest is history. SQL's syntax is based around a set of command line tabular data processing tools, and COBOL. Full of bugs, inconsistencies, and a mishmash proprietary versions and features that don't have a grounding in math or logic, results in a situation where it really is unclear what goes where.
I think the trend you must be talking about is recent proliferation of ORMs: misguided and ill thought out attempts to patch over the obvious deficiencies of SQL. Database triggers and procedures are another misfeature trying to patch over SQL's problems.
If history had played out in a logical and orderly way, the answer to your question would be simple: Just follow the rules of the relational model and everything will work itself out. Unfortunately, the rules of the relational model don't fit cleanly into the current crop of SQL based DBMS's, so some application level fiddling, or triggers, or whatever other stupid patch is unfortunately necessary, and it ends up being a matter of subjective opinion, rather than reasoned argument, which stupid hack you use.
So the real answer is to just follow the relational model as close as you can, and then fudge it the rest of the way. Put the logic in the application if you're the only one using the db, and you need to keep all your source code in a version repository. If multiple applications are likely to use the database, make the DB as bullet proof and self sufficient as it can be- The main goal here is to ensure that the data remains consistent.
Ultimately the database and how you connect to it is your "persistence API" -- how much is in the database and how much is in the application is application-specific. But the important aspect is that the API boundary is responsible for producing or consuming correct data.
Personally I prefer a thin access layer in the application and sprocs/PKs/FKs in the database to enforce transactional correctness and to enable API versioning. This also allows other applications to modify the database without upsetting the data model.
And if I see another moron using *SELECT * FROM blah* I'm going to go nuts with an Uzi... :-)
"The database should keep your data stored properly and efficiently whereas the application should own the logic" - Nelson LaQeut in another answer.
This seems to be the crux of the issue: that all "logic" belongs to the application and not to the database. But what is meant by "logic"? There are various kinds of "logic", some of which belong in an application and some, I would say, better placed in the database.
I would think most developers would agree (surely?) that basic data integrity such as primary and foreign keys belongs in the database. There is less agreement on more sophisticated data integrity logic - even the humble but useful check constraint is woefully underused in general. .
The application camp see the database is "merely" a place to store the data that "belongs" to their application. The database camp (which is where I sit) see the application as "merely" one (perhaps currently the only) user of the data that "belongs" to the database - or rather that belongs to the business and is managed for the business by the database (DBMS = database management system).
If all your data logic is tied up in your application, what happens when the application needs to be rewritten in the latest trendy paradigm (or do you think J2EE for example is the last there will ever be)? As Tom Kyte often says, it's all about the data.
The database is an integral part of an application, but everyone interprets that differently. It's definitely a wise move to isolate them, but that shouldn't mean that you circumvent what they do in your programming. Correct data types and primary key references are important parts of good database design, on top of which a good application can be built.
Although I personally believe the Database should have enough smarts to defend itself, some people that don't understand that Databases aren't dumb services, think, and not incorrectly mind you, that data and logic should be separated. Now in many cases the separation of data and logic is a powerful tool, however most databases already provide us with solid implementations of atomicity, redundancy, processing, checking, etc... And many times that's where it belongs, however since the quality of these services and their API differs among vendors, many application programmers have felt like its worth trying to implement this sort of stuff in the application layer, to avoid tying themselves up with a specific database layer.
I can't say that I've seen a "trend" to create poor applications with terrible database designs. Programming is just like any other discipline in that there will be people who won't learn the tools or just want to cut corners. I've even talked to a person that just didn't "trust" databases. The applications that you described are just as you said, ridiculous nightmares. Don't follow those "trends".
I still prefer to use Stored Procedures and functions in SQL server. It adds more flexibility to application acturally. And it has a performance benefit also. Generally I don't think it is good idea to put everything to applicatons.
I think that those "developers" who created databases without indexes or with single VARCHAR(2000) column are just art majors who are making their first attempt into entering the high-priced IT world.
No-one, who has even little-bit of IT education, makes this kind of database structures.
I can understand the reason to keep logic out of the well formed database. Usually it is time-consuming to make changes (you have to convince database admins to make it, and all the red-tape that comes with it). If the business logic is in your program, then its up to you only.
Use constraints in the database, but for any sophisticated logic I would place that in a data access layer or use one of the standard Object Relational Mapping (ORM) tools such as Hibernate/NHibernate.
There is a general belief that this will affect performance; based on the view that stored procedures are pre-compiled but 'raw' queries have to be compiled on every call. However, I work mostly in SQL Server 2005/2008, and that is very efficient at handling 'raw' parameterised queries, caching the compiled query path for future calls to the database. This means that there under SQL Server there is essentially no difference between the performance of stored procedures to parameterised SQL queries.
The only downside on losing stored procedures is if you are very granular with your database security permissions, and which to enforce security at the database login level.
I have a simple philosophy.
If it's need to keep the database secure and in a consistant state, make sure to do it in the database
I do try to keep a lot of other stuff there too, in my world it's easier to update a client's database than it is to update their application...
Essentially I try to treat the database as a pseudo object. A bunch of methods I can call, etc, but I don't want the app to care about the detail of the internal data storage.
In my experience, putting any application logic in the database always results in a WTF. It doesn't matter how smart the database programmer, how advanced the database, it always ends up being a mistake. The reverse question is "how often should my C# code manage relational data using its own flat-file structure and query language", to which the answer is (almost) always never.
I think the database should be used for data storage, which is what it's good at.

Resources