Why did object oriented databases fail? [closed] - database

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Why did object oriented databases fail?
I find it astonishing that:
foo bar = new foo();
bar.saveToDatabase();
Lost to:
foo bar = new foo();
/* write complicated code to extract stuff from foo */
/* write complicated code to write stuff to database */
Related questions:
Are Object oriented databases still
in use?
Object Oriented vs Relational
Databases
Why have object oriented databases not been successful (yet)?

Probably because of their coupling with specific programming languages.

First, I don't believe they have "failed" entirely. There are still some around, and they're still used by a couple of companies, as far as I know.
anyway, probably because a lot of the data we want to store in a database is relational in nature.
The problem is that while yes, OO databases are easier to integrate into OO programming languages, relational databases make it easier to define queries and, well, relations between the data stored. Which is often the complicated part.

I have been using db4o (an object oriented database) lately on my personal pet projects, just because it is so quick to set up & get going. No need with the itty, gritty details.
That aside, as I see it, the main reasons why they haven't become popular are:
Reporting is more difficult on object oriented databases. Related to this, it is also easier to manually look into the actual data in a relational database.
Microsoft and Oracle base so much of their business on relational databases.
Lots of businesses already have relational databases in place.
The IT departments often have relational database expertise.
And, as Jan Aagaard, have pointed out, lately it is because the problem have been solved in a different way, giving programmers the object oriented feel even though they program against a relational database.

There are countless numbers of existing applications out there storing their data in relational databases. This data is the lifeblood of those companies. They have collectively invested huge amounts in storing, maintaining and reporting on this data. The cost and risk of moving this priceless information into a fundamentally different environment is extremely high.
Now consider that ORM tools can map modern application data structures into traditional relational models, and you remove pretty much any incentive to migrate to OODBMS. They provide a low-risk alternative to a very costly high risk migration.

Because, as much as ODBMS advertisements were laden with derogatory language about ORM systems, it wasn't that hard to make ORMs do the job, and without all the various hits taken in switching to a pure ODBMS.
What actually happened is that your first code sample won, it just happens to be on a RDBMS persistence layer.

I think it is because the problem was solved differently. You might be using a relational database behind the scenes when you are coding in Ruby on Rails or LINQ to SQL, but it feels like you are working with objects.

Very subjective, but a few reasons come to mind:
Performance has not been as good as relational databases (or at least that's the perception)
Again with performance - relational databases allow you to do things like denormalizing data to further improve performance.
Legacy support for all the non-OO apps that need to access the data.

I think a lot of your answer lies in the "Why we abandoned Object Databases" answer of "Object Oriented vs Relational Databases".
As far as your example goes, it doesn't have to be that way. Linq to SQL is actually a quite nice basic layer over a DBMS, and Linq to Entities (v2 -- v1 sucked) will be pretty cool too. (N)Hibernate has been solving the problem you're having for years now using RDBMSes.
So I guess my answer to you is that O/R mappers are getting to the point where they solve your problem nicely, and you don't need an ODBMS to get what you need.

They will succeed some day. They are future.
Looking back to software technologies in history, the trend is sacrificing performance to reduce complexity (Assembly => C => C++ => .NET). An application which takes 30 minutes to code now, some days in past took a month.
ORMs are right answer to wrong question. Currently, they are the choice, since they make life easier in the absence of a better solution. But they cannot handle the level of complexity they aimed to. "Problems cannot be solved by the same level of thinking that created them." A.E
As others mentioned relational databases are heavily used and relied and replacing them forces a lot of risks. Look the interval between SQL versions and the major changes between these versions and other Microsoft products (conservative approach, which is necessary here). Also I'll add the following items:
Current approach still works. You may argue it will work forever (we
can code assembly yet), but here I mean it doesn't
work logically when, the AVERAGE level of projects complexity and
the time to develop them on relational databases rings the bell.
Major companies did not involved seriously. When the market signals, they do.
The problem is not well-defined yet. Unfortunately current failures help.
It need some improvements in other sciences (QC, AI) rather than
computer. Storing and querying multidimensional data on flat
infrastructure and without enough smartness for self-organizing are
the top obstacles at the theoretic level.

Why not?
I guess they were a solution to a problem nobody was having, or not having enough to pay for it.

Further, OOP and set-based programming are not always very comptatble.
Personally, when I started reading about OO databases, I couldn't help but think "Boy, I hope I never have to work on one of those, update 1 million rows out of a 6 million row table and then make sure all appropriate records in other tables get updated as well"

Related

Best practices for creating a data model [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 6 years ago.
Improve this question
For a current project I'm creating a data model. Are there any sources where I can find "best practices" for a good data model? Good means flexible, efficient, with good performance, style, ... Some example questions would be "naming of columns", "what data should be normalized", or "which attributes should be exported into an own table". The source should be a book :-)
Personally I think you should read a book on performance tuning before beginning to model a database. The right design can make a world of difference. If you are not expert in performance tuning, you aren't qualified to design a database.
These books are Database specific, here is one for SQl Server.
http://www.amazon.com/Server-Performance-Tuning-Distilled-Experts/dp/1430219025/ref=sr_1_1?s=books&ie=UTF8&qid=1313603282&sr=1-1
Another book that you should read before starting to design is about antipatterns. Always good to know what you should avoid doing.
http://www.amazon.com/SQL-Antipatterns-Programming-Pragmatic-Programmers/dp/1934356557/ref=sr_1_1?s=books&ie=UTF8&qid=1313603622&sr=1-1
Do not get stuck in the trap of designing for flexibility. People use that as a way to get out of doing the work to design correctly and flexible databases almost always perform badly. If more than 5% of your database design depends on flexibility, you haven't modeled correctly in my opinion. All the worst COTS products I've had to work with were designed for flexibility first.
Any decent database book will discuss normalization. You can also find that information easily on the web. Be sure to actually create FK/PK relationships.
As far as naming columns, pick a standard and stick with it consistently. Consistency is more important than the actual standard. Don't name columns ID (see SQL antipatterns book). Use the same name and datatypes if columns are going to be in several different tables. What you are going for is to not have to use functions to do joins because of datatype mismatches.
Always remember that databases can (and will) be changed outside the application. Anything that is needed for data integrity must be in the database not the application code. The data will be there long after the application has been replaced.
The most important things for database design:
Thorough definition of the data needed (including correct datatypes)
and the relationships between pieces of data (including correct normalization)
data integrity
performance
security
consistency (of datatypes, naming standards etc.)
The best book I've read on the design of database systems was "An Introduction to Database Systems". Joe Celko's SQL for Smarties books are also worth reading.
Assuming you're building an application and not just a database, and assuming you're using an Object Oriented language, Applying UML and Patterns by Craig Larman has a good discussion on mapping databases to objects.
In terms of defining "good", in my experience "maintainable" is probably top of the list. Maintainability in database design means many things, such as sticking to conventions - I often recommend http://justinsomnia.org/2003/04/essential-database-naming-conventions-and-style/. Normalization is another obvious maintainability strategy. I often recommend being generous with column types - it's hard to change an application if you find out that postal codes in different countries are longer than in the US. I often recommend using views to abstract complex data relations away for less experienced developers.
A key thing with maintainability is the ability to test and deploy. It's worth reading up about Continuous Database Integration (http://www.codeproject.com/KB/architecture/Database_CI.aspx) - whilst not strictly associated with the design of the database schema, it's important context.
As for performance - I believe you should design for maintainability first, and only design for performance if you know you have a problem. Sometimes, you know in advance that performance will be a major problem - designing a database for Facebook (or Stack Exchange), designing a database with huge amounts of data (terabytes and up), or huge numbers of users. Most systems don't fall into that camp - so I recommend regular performance tests, with representative data, to find if you have a problem, and only tune when you can prove you have to. Many performance optimizations are at the expense of maintainability - denormalization, for instance.
Oh, and in general, avoid triggers and stored procedures if you can. That's just my opinion, though...
Even though it is not a book I recommend to read Query evaluation techniques for large databases. It gives a background on query processing which largely influences your schema design, especially for data intensive (e.g., analytical) workloads. It is less hands-on but I believe every database designer should read it at least once :-).

What should every developer know about databases? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
Whether we like it or not, many if not most of us developers either regularly work with databases or may have to work with one someday. And considering the amount of misuse and abuse in the wild, and the volume of database-related questions that come up every day, it's fair to say that there are certain concepts that developers should know - even if they don't design or work with databases today.
What is one important concept that developers and other software professionals ought to know about databases?
The very first thing developers should know about databases is this: what are databases for? Not how do they work, nor how do you build one, nor even how do you write code to retrieve or update the data in a database. But what are they for?
Unfortunately, the answer to this one is a moving target. In the heydey of databases, the 1970s through the early 1990s, databases were for the sharing of data. If you were using a database, and you weren't sharing data you were either involved in an academic project or you were wasting resources, including yourself. Setting up a database and taming a DBMS were such monumental tasks that the payback, in terms of data exploited multiple times, had to be huge to match the investment.
Over the last 15 years, databases have come to be used for storing the persistent data associated with just one application. Building a database for MySQL, or Access, or SQL Server has become so routine that databases have become almost a routine part of an ordinary application. Sometimes, that initial limited mission gets pushed upward by mission creep, as the real value of the data becomes apparent. Unfortunately, databases that were designed with a single purpose in mind often fail dramatically when they begin to be pushed into a role that's enterprise wide and mission critical.
The second thing developers need to learn about databases is the whole data centric view of the world. The data centric world view is more different from the process centric world view than anything most developers have ever learned. Compared to this gap, the gap between structured programming and object oriented programming is relatively small.
The third thing developers need to learn, at least in an overview, is data modeling, including conceptual data modeling, logical data modeling, and physical data modeling.
Conceptual data modeling is really requirements analysis from a data centric point of view.
Logical data modeling is generally the application of a specific data model to the requirements discovered in conceptual data modeling. The relational model is used far more than any other specific model, and developers need to learn the relational model for sure. Designing a powerful and relevant relational model for a nontrivial requirement is not a trivial task. You can't build good SQL tables if you misunderstand the relational model.
Physical data modeling is generally DBMS specific, and doesn't need to be learned in much detail, unless the developer is also the database builder or the DBA. What developers do need to understand is the extent to which physical database design can be separated from logical database design, and the extent to which producing a high speed database can be accomplished just by tweaking the physical design.
The next thing developers need to learn is that while speed (performance) is important, other measures of design goodness are even more important, such as the ability to revise and extend the scope of the database down the road, or simplicity of programming.
Finally, anybody who messes with databases needs to understand that the value of data often outlasts the system that captured it.
Whew!
Good question. The following are some thoughts in no particular order:
Normalization, to at least the second normal form, is essential.
Referential integrity is also essential, with proper cascading delete and update considerations.
Good and proper use of check constraints. Let the database do as much work as possible.
Don't scatter business logic in both the database and middle tier code. Pick one or the other, preferably in middle tier code.
Decide on a consistent approach for primary keys and clustered keys.
Don't over index. Choose your indexes wisely.
Consistent table and column naming. Pick a standard and stick to it.
Limit the number of columns in the database that will accept null values.
Don't get carried away with triggers. They have their use but can complicate things in a hurry.
Be careful with UDFs. They are great but can cause performance problems when you're not aware how often they might get called in a query.
Get Celko's book on database design. The man is arrogant but knows his stuff.
First, developers need to understand that there is something to know about databases. They're not just magic devices where you put in the SQL and get out result sets, but rather very complicated pieces of software with their own logic and quirks.
Second, that there are different database setups for different purposes. You do not want a developer making historical reports off an on-line transactional database if there's a data warehouse available.
Third, developers need to understand basic SQL, including joins.
Past this, it depends on how closely the developers are involved. I've worked in jobs where I was developer and de facto DBA, where the DBAs were just down the aisle, and where the DBAs are off in their own area. (I dislike the third.) Assuming the developers are involved in database design:
They need to understand basic normalization, at least the first three normal forms. Anything beyond that, get a DBA. For those with any experience with US courtrooms (and random television shows count here), there's the mnemonic "Depend on the key, the whole key, and nothing but the key, so help you Codd."
They need to have a clue about indexes, by which I mean they should have some idea what indexes they need and how they're likely to affect performance. This means not having useless indices, but not being afraid to add them to assist queries. Anything further (like the balance) should be left for the DBA.
They need to understand the need for data integrity, and be able to point to where they're verifying the data and what they're doing if they find problems. This doesn't have to be in the database (where it will be difficult to issue a meaningful error message for the user), but has to be somewhere.
They should have the basic knowledge of how to get a plan, and how to read it in general (at least enough to tell whether the algorithms are efficient or not).
They should know vaguely what a trigger is, what a view is, and that it's possible to partition pieces of databases. They don't need any sort of details, but they need to know to ask the DBA about these things.
They should of course know not to meddle with production data, or production code, or anything like that, and they should know that all source code goes into a VCS.
I've doubtless forgotten something, but the average developer need not be a DBA, provided there is a real DBA at hand.
Basic Indexing
I'm always shocked to see a table or an entire database with no indexes, or arbitrary/useless indexes. Even if you're not designing the database and just have to write some queries, it's still vital to understand, at a minimum:
What's indexed in your database and what's not:
The difference between types of scans, how they're chosen, and how the way you write a query can influence that choice;
The concept of coverage (why you shouldn't just write SELECT *);
The difference between a clustered and non-clustered index;
Why more/bigger indexes are not necessarily better;
Why you should try to avoid wrapping filter columns in functions.
Designers should also be aware of common index anti-patterns, for example:
The Access anti-pattern (indexing every column, one by one)
The Catch-All anti-pattern (one massive index on all or most columns, apparently created under the mistaken impression that it would speed up every conceivable query involving any of those columns).
The quality of a database's indexing - and whether or not you take advantage of it with the queries you write - accounts for by far the most significant chunk of performance. 9 out of 10 questions posted on SO and other forums complaining about poor performance invariably turn out to be due to poor indexing or a non-sargable expression.
Normalization
It always depresses me to see somebody struggling to write an excessively complicated query that would have been completely straightforward with a normalized design ("Show me total sales per region.").
If you understand this at the outset and design accordingly, you'll save yourself a lot of pain later. It's easy to denormalize for performance after you've normalized; it's not so easy to normalize a database that wasn't designed that way from the start.
At the very least, you should know what 3NF is and how to get there. With most transactional databases, this is a very good balance between making queries easy to write and maintaining good performance.
How Indexes Work
It's probably not the most important, but for sure the most underestimated topic.
The problem with indexing is that SQL tutorials usually don't mention them at all and that all the toy examples work without any index.
Even more experienced developers can write fairly good (and complex) SQL without knowing more about indexes than "An index makes the query fast".
That's because SQL databases do a very good job working as black-box:
Tell me what you need (gimme SQL), I'll take care of it.
And that works perfectly to retrieve the correct results. The author of the SQL doesn't need to know what the system is doing behind the scenes--until everything becomes sooo slooooow.....
That's when indexing becomes a topic. But that's usually very late and somebody (some company?) is already suffering from a real problem.
That's why I believe indexing is the No. 1 topic not to forget when working with databases. Unfortunately, it is very easy to forget it.
Disclaimer
The arguments are borrowed from the preface of my free eBook "Use The Index, Luke". I am spending quite a lot of my time explaining how indexes work and how to use them properly.
I just want to point out an observation - that is that it seems that the majority of responses assume database is interchangeable with relational databases. There are also object databases, flat file databases. It is important to asses the needs of the of the software project at hand. From a programmer perspective the database decision can be delayed until later. Data modeling on the other hand can be achieved early on and lead to much success.
I think data modeling is a key component and is a relatively old concept yet it is one that has been forgotten by many in the software industry. Data modeling, especially conceptual modeling, can reveal the functional behavior of a system and can be relied on as a road map for development.
On the other hand, the type of database required can be determined based on many different factors to include environment, user volume, and available local hardware such as harddrive space.
Avoiding SQL injection and how to secure your database
Every developer should know that this is false: "Profiling a database operation is completely different from profiling code."
There is a clear Big-O in the traditional sense. When you do an EXPLAIN PLAN (or the equivalent) you're seeing the algorithm. Some algorithms involve nested loops and are O( n ^ 2 ). Other algorithms involve B-tree lookups and are O( n log n ).
This is very, very serious. It's central to understanding why indexes matter. It's central to understanding the speed-normalization-denormalization tradeoffs. It's central to understanding why a data warehouse uses a star-schema which is not normalized for transactional updates.
If you're unclear on the algorithm being used do the following. Stop. Explain the Query Execution plan. Adjust indexes accordingly.
Also, the corollary: More Indexes are Not Better.
Sometimes an index focused on one operation will slow other operations down. Depending on the ratio of the two operations, adding an index may have good effects, no overall impact, or be detrimental to overall performance.
I think every developer should understand that databases require a different paradigm.
When writing a query to get at your data, a set-based approach is needed. Many people with an interative background struggle with this. And yet, when they embrace it, they can achieve far better results, even though the solution may not be the one that first presented itself in their iterative-focussed minds.
Excellent question. Let's see, first no one should consider querying a datbase who does not thoroughly understand joins. That's like driving a car without knowing where the steering wheel and brakes are. You also need to know datatypes and how to choose the best one.
Another thing that developers should understand is that there are three things you should have in mind when designing a database:
Data integrity - if the data can't be relied on you essentially have no data - this means do not put required logic in the application as many other sources may touch the database. Constraints, foreign keys and sometimes triggers are necessary to data integrity. Don't fail to use them because you don't like them or don't want to be bothered to understand them.
Performance - it is very hard to refactor a poorly performing database and performance should be considered from the start. There are many ways to do the same query and some are known to be faster almost always, it is short-sighted not to learn and use these ways. Read some books on performance tuning before designing queries or database structures.
Security - this data is the life-blood of your company, it also frequently contains personal information that can be stolen. Learn to protect your data from SQL injection attacks and fraud and identity theft.
When querying a database, it is easy to get the wrong answer. Make sure you understand your data model thoroughly. Remember often actual decisions are made based on the data your query returns. When it is wrong, the wrong business decisions are made. You can kill a company from bad queries or loose a big customer. Data has meaning, developers often seem to forget that.
Data almost never goes away, think in terms of storing data over time instead of just how to get it in today. That database that worked fine when it had a hundred thousand records, may not be so nice in ten years. Applications rarely last as long as data. This is one reason why designing for performance is critical.
Your database will probaly need fields that the application doesn't need to see. Things like GUIDs for replication, date inserted fields. etc. You also may need to store history of changes and who made them when and be able to restore bad changes from this storehouse. Think about how you intend to do this before you come ask a web site how to fix the problem where you forgot to put a where clause on an update and updated the whole table.
Never develop in a newer version of a database than the production version. Never, never, never develop directly against a production database.
If you don't have a database administrator, make sure someone is making backups and knows how to restore them and has tested restoring them.
Database code is code, there is no excuse for not keeping it in source control just like the rest of your code.
Evolutionary Database Design. http://martinfowler.com/articles/evodb.html
These agile methodologies make database change process manageable, predictable and testable.
Developers should know, what it takes to refactor a production database in terms of version control, continious integration and automated testing.
Evolutionary Database Design process has administrative aspects, for example a column is to be dropped after some life time period in all databases of this codebase.
At least know, that Database Refactoring concept and methodologies exist.
http://www.agiledata.org/essays/databaseRefactoringCatalog.html
Classification and process description makes it possible to implement tooling for these refactorings too.
About the following comment to Walter M.'s answer:
"Very well written! And the historical perspective is great for people who weren't doing database work at that time (i.e. me)".
The historical perspective is in a certain sense absolutely crucial. "Those who forget history, are doomed to repeat it.". Cfr XML repeating the hierarchical mistakes of the past, graph databases repeating the network mistakes of the past, OO systems forcing the hierarchical model upon users while everybody with even just a tenth of a brain should know that the hierarchical model is not suitable for general-purpose representation of the real world, etcetera, etcetera.
As for the question itself:
Every database developer should know that "Relational" is not equal to "SQL". Then they would understand why they are being let down so abysmally by the DBMS vendors, and why they should be telling those same vendors to come up with better stuff (e.g. DBMS's that are truly relational) if they want to go on sucking hilarious amounts of money out of their customers for such crappy software).
And every database developer should know everything about the relational algebra. Then there would no longer be a single developer left who had to post these stupid "I don't know how to do my job and want someone else to do it for me" questions on Stack Overflow anymore.
From my experience with relational databases, every developer should know:
- The different data types:
Using the correct type for the correct job will make your DB design more robust, your queries faster and your life easier.
- Learn about 1xM and MxM:
This is the bread and butter for relational databases. You need to understand one-to-many and many-to-many relations and apply then when appropriate.
- "K.I.S.S." principle applies to the DB as well:
Simplicity always works best. Provided you have studied how DB work, you will avoid unnecessary complexity which will lead to maintenance and speed problems.
- Indices:
It's not enough if you know what they are. You need to understand when to used them and when not to.
also:
Boolean algebra is your friend
Images: Don't store them on the DB. Don't ask why.
Test DELETE with SELECT
I would like everyone, both DBAs and developer/designer/architects, to better understand how to properly model a business domain, and how to map/translate that business domain model into both a normalized database logical model, an optimized physical model, and an appropriate object oriented class model, each one of which is (can be) different, for various reasons, and understand when, why, and how they are (or should be) different from one another.
I would say strong basic SQL skills. I've seen a lot of developers so far who know a little about databases but are always asking for tips about how to formulate a quite simple query. Queries are not always that easy and simple. You do have to use multiple joins (inner, left, etc.) when querying a well normalized database.
I think a lot of the technical details have been covered here and I don't want to add to them. The one thing I want to say is more social than technical, don't fall for the "DBA knowing the best" trap as an application developer.
If you are having performance issues with query take ownership of the problem too. Do your own research and push for the DBAs to explain what's happening and how their solutions are addressing the problem.
Come up with your own suggestions too after you have done the research. That is, I try to find a cooperative solution to the problem rather than leaving database issues to the DBAs.
Simple respect.
It's not just a repository
You probably don't know better than the vendor or the DBAs
You won't support it at 3 a.m. with senior managers shouting at you
Consider Denormalization as a possible angel, not the devil, and also consider NoSQL databases as an alternative to relational databases.
Also, I think the Entity-Relation model is a must-know for every developper even if you don't design databases. It'll let you understand thoroughly what's your database all about.
Never insert data with the wrong text encoding.
Once your database becomes polluted with multiple encodings, the best you can do is apply some kind combination of heuristics and manual labor.
Aside from syntax and conceptual options they employ (such as joins, triggers, and stored procedures), one thing that will be critical for every developer employing a database is this:
Know how your engine is going to perform the query you are writing with specificity.
The reason I think this is so important is simply production stability. You should know how your code performs so you're not stopping all execution in your thread while you wait for a long function to complete, so why would you not want to know how your query will affect the database, your program, and perhaps even the server?
This is actually something that has hit my R&D team more times than missing semicolons or the like. The presumtion is the query will execute quickly because it does on their development system with only a few thousand rows in the tables. Even if the production database is the same size, it is more than likely going to be used a lot more, and thus suffer from other constraints like multiple users accessing it at the same time, or something going wrong with another query elsewhere, thus delaying the result of this query.
Even simple things like how joins affect performance of a query are invaluable in production. There are many features of many database engines that make things easier conceptually, but may introduce gotchas in performance if not thought of clearly.
Know your database engine execution process and plan for it.
For a middle-of-the-road professional developer who uses databases a lot (writing/maintaining queries daily or almost daily), I think the expectation should be the same as any other field: You wrote one in college.
Every C++ geek wrote a string class in college. Every graphics geek wrote a raytracer in college. Every web geek wrote interactive websites (usually before we had "web frameworks") in college. Every hardware nerd (and even software nerds) built a CPU in college. Every physician dissected an entire cadaver in college, even if she's only going to take my blood pressure and tell me my cholesterol is too high today. Why would databases be any different?
Unfortunately, they do seem different, today, for some reason. People want .NET programmers to know how strings work in C, but the internals of your RDBMS shouldn't concern you too much.
It's virtually impossible to get the same level of understanding from just reading about them, or even working your way down from the top. But if you start at the bottom and understand each piece, then it's relatively easy to figure out the specifics for your database. Even things that lots of database geeks can't seem to grok, like when to use a non-relational database.
Maybe that's a bit strict, especially if you didn't study computer science in college. I'll tone it down some: You could write one today, completely, from scratch. I don't care if you know the specifics of how the PostgreSQL query optimizer works, but if you know enough to write one yourself, it probably won't be too different from what they did. And you know, it's really not that hard to write a basic one.
The order of columns in a non-unique index is important.
The first column should be the column that has the most variability in its content (i.e. cardinality).
This is to aid SQL Server ability to create useful statistics in how to use the index at runtime.
Understand the tools that you use to program the database!!!
I wasted so much time trying to understand why my code was mysteriously failing.
If you're using .NET, for example, you need to know how to properly use the objects in the System.Data.SqlClient namespace. You need to know how to manage your SqlConnection objects to make sure they are opened, closed, and when necessary, disposed properly.
You need to know that when you use a SqlDataReader, it is necessary to close it separately from your SqlConnection. You need to understand how to keep connections open when appropriate to how to minimize the number of hits to the database (because they are relatively expensive in terms of computing time).
Basic SQL skills.
Indexing.
Deal with different incarnations of DATE/ TIME/ TIMESTAMP.
JDBC driver documentation for the platform you are using.
Deal with binary data types (CLOB, BLOB, etc.)
For some projects, and Object-Oriented model is better.
For other projects, a Relational model is better.
The impedance mismatch problem, and know the common deficiencies or ORMs.
RDBMS Compatibility
Look if it is needed to run the application in more than one RDBMS. If yes, it might be necessary to:
avoid RDBMS SQL extensions
eliminate triggers and store procedures
follow strict SQL standards
convert field data types
change transaction isolation levels
Otherwise, these questions should be treated separately and different versions (or configurations) of the application would be developed.
Don't depend on the order of rows returned by an SQL query.
Three (things) is the magic number:
Your database needs version control too.
Cursors are slow and you probably don't need them.
Triggers are evil*
*almost always

Is ORM fit for complex projects? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
I've not started the ORM trip yet,
because I'm not sure how it works when the project becomes very complex.
What's your opinion or experience?
This question is a difficult one to answer sometimes. You may have heard of the Object-Relational Impedance Mismatch before; that is the issue that ORM tools attempt to solve, but it is fraught with problems. It is one of those situations where you can solve 90% of the problem in a very short time, but every additional 1% from there on up seems to increase exponentially in complexity because of all the dependencies.
An ORM framework is an abstraction, and becomes a leaky abstraction at several points:
Complex queries/scripts involving concepts like UDTs, CTEs, query hints, temporary tables, windowing functions, etc.
Performance-optimized queries. As Quassnoi mentioned in his answer, ORMs are getting better at this but frequently generate sub-optimal queries, and sometimes the effect is extremely noticeable.
Transaction management - the Unit-of-Work pattern can only get you so far when you have to deal with large batch updates.
Cross-database or cross-server actions. There are workarounds, but they are just that - workarounds. No ORM I've seen really handles this well.
Multiple-table inheritance - this is the only form of inheritance that is actually normalized, and it is really not that hard to manage using pure SQL and manual mapping, but O/R mappers are lousy at it. For many of us, single-table inheritance is not an acceptable alternative.
Those are some of many areas where O/R mappers seem to fail us. Having said that, this does not mean that O/R Mappers are not "fit" for large projects.
In my opinion, ORM in general and O/R Mappers specifically are almost vital for large projects. They save enormous amounts of effort and can help you get an application out the door in a fraction of the time it would have taken you otherwise. They just do not solve the whole problem. You have to be prepared to profile your application to see what the ORM is really doing, and you have to be prepared to drop back down to pure SQL when the situation calls for it (i.e. in several of the situations above).
Some frameworks, such as Linq to SQL, expect you to do this and give you ready-made facilities for executing commands or stored procedures on the same connection and in the same transaction used for the mapper's "regular" duties. L2S is not the only framework that lets you do this, but several are more restrictive, and you end up jumping through many hoops to get what you need. When choosing an ORM, I think that the ability to bypass the abstraction is an important consideration, at least today.
I think the best answer to this question is: Yes, they are fit for large projects, as long as you do not rely exclusively on them. Know the limits of your ORM tool of choice, use it as a time-saver in the 90% of instances when you can, and make sure you and your team understand what's really going on under the hood for those instances when the abstraction leaks.
ORM. by defintion, is object-relational mapping.
This means that you should transform the data stored in a relational database into the objects usable by an object-oriented programming language.
The objects may supply some methods that may involve data processing and searching for the other objects.
This is where the problems begin.
The data processing may be implemented on the ORM side (which means loading the data from the database, applying the object wraparound and implementing the methods on the programming language you use), or on the database side (when the data processing commands are issued as a query to the database).
Compare this:
MyAccount->Transactions()->GetLast()
This can be implemented in two ways:
SELECT *
FROM Accounts
WHERE user_id = #me
into $MyAccount
, then
SELECT *
FROM Transactions
WHERE account_id = #myaccount
into a client-side array #Transactions
, then $Transactions[-1] to get the last.
This is inefficient way, and you'll notice it as you get more data.
Alternatively, a smart ORM can convert it into this:
SELECT TOP 1 Transactions.*
FROM Accounts
JOIN Transactions
ON Transactions.Account = Accounts.id
WHERE Accounts.UserID = #me
ORDER BY
Transactions.Date DESC
, but it has to be a really smart ORM.
So the answer to the question "whether to use an ORM or not" is the answer to the question "will my ORM allow me to issue set-based operations to the database should the need arise"?
This is subjective. My answer is specifically about automated ORM Tools.
I have a philosophical objection to ORM Tools for the following reasons:
1- A table is not and should not necessarily be a one-to-one mapping to a business object.
2- Base CRUD/Business Object code is boring to write, but it's critical to your application. I'd rather be in control and have knowledge of it. (a little NIH syndrome)
3- A new developer coming in is going to have an easier time learning a traditional object model versus whatever bizarre syntax is created by the ORM tool.
You don't mention what platform you are using, but if I wanted to read a record from a database in .NET without using an ORM, I would have to:
Read a Connection String
Open a Database Connection
Open a Command object against the connection
Read my Record (by execute a SQL statement against the Command object)
Transform that Record to an object
in my language of choice
Close my query
Close my connection
Sound complicated? An ORM does all of the same things automatically under the covers, and I only need a few lines of code. In addition, because the ORM has knowledge of your data model, it can sometimes perform optimizations such as caching and lazy loading.
When project becomes more complex it is even better, because it let's keep everything at the same level of abstraction, rather than jumping from objects to SQL. We once have written our own layer in paralell to developing the application (because we couldn't use any traditional ORM), and the more powerful it became, the easier managing application become.
Performance concerns are you usually overrated. It's usually in different place, then you would expect. We had some badass abstraction layer written in Python, and it working great. What sucked, was url library, which we had to rewrite in C. Really, you can always optimize queries, that are most important at the end, writing SQL by hand at the moment, when you see, that performance needs it. But in most times - you won't have to.
In my opinion ORM built for big projects, to minimize effort and development time.
But if you are developing an application which needs very high speed data access code, you need to avoid ORMs as you can because ORMs add a new layer in your application
ORMs are ideal for large projects because they provide a layer of protection from changes on the database side, and speed up the process of adding new features. If performance becomes an issue, you can use a different method to get to your data at the query where you encounter a bottleneck, rather than hand-optimizing every query in the application.
ORM's are cool if you want to pump out web app's quickly to do customer development and see if people actually use your product.
http://www.youtube.com/watch?v=uFLRc6y_O3s
The very last topic that Josh Berkus discusses is "Runaway ORM's". Check it out at 37:20.

Why should you use an ORM? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
If you are motivate to the "pros" of an ORM and why would you use an ORM to management/client, what are those reasons would be?
Try and keep one reason per answer so that we can see which one gets voted up as the best reason.
The most important reason to use an ORM is so that you can have a rich, object oriented business model and still be able to store it and write effective queries quickly against a relational database. From my viewpoint, I don't see any real advantages that a good ORM gives you when compared with other generated DAL's other than the advanced types of queries you can write.
One type of query I am thinking of is a polymorphic query. A simple ORM query might select all shapes in your database. You get a collection of shapes back. But each instance is a square, circle or rectangle according to its discriminator.
Another type of query would be one that eagerly fetches an object and one or more related objects or collections in a single database call. e.g. Each shape object is returned with its vertex and side collections populated.
I'm sorry to disagree with so many others here, but I don't think that code generation is a good enough reason by itself to go with an ORM. You can write or find many good DAL templates for code generators that do not have the conceptual or performance overhead that ORM's do.
Or, if you think that you don't need to know how to write good SQL to use an ORM, again, I disagree. It might be true that from the perspective of writing single queries, relying on an ORM is easier. But, with ORM's it is far too easy to create poor performing routines when developers don't understand how their queries work with the ORM and the SQL they translate into.
Having a data layer that works against multiple databases can be a benefit. It's not one that I have had to rely on that often though.
In the end, I have to reiterate that in my experience, if you are not using the more advanced query features of your ORM, there are other options that solve the remaining problems with less learning and fewer CPU cycles.
Oh yeah, some developers do find working with ORM's to be fun so ORM's are also good from the keep-your-developers-happy perspective. =)
Speeding development. For example, eliminating repetitive code like mapping query result fields to object members and vice-versa.
Making data access more abstract and portable. ORM implementation classes know how to write vendor-specific SQL, so you don't have to.
Supporting OO encapsulation of business rules in your data access layer. You can write (and debug) business rules in your application language of preference, instead of clunky trigger and stored procedure languages.
Generating boilerplate code for basic CRUD operations. Some ORM frameworks can inspect database metadata directly, read metadata mapping files, or use declarative class properties.
You can move to different database software easily because you are developing to an abstraction.
Development happiness, IMO. ORM abstracts away a lot of the bare-metal stuff you have to do in SQL. It keeps your code base simple: fewer source files to manage and schema changes don't require hours of upkeep.
I'm currently using an ORM and it has sped up my development.
So that your object model and persistence model match.
To minimise duplication of simple SQL queries.
The reason I'm looking into it is to avoid the generated code from VS2005's DAL tools (schema mapping, TableAdapters).
The DAL/BLL i created over a year ago was working fine (for what I had built it for) until someone else started using it to take advantage of some of the generated functions (which I had no idea were there)
It looks like it will provide a much more intuitive and cleaner solution than the DAL/BLL solution from http://wwww.asp.net
I was thinking about created my own SQL Command C# DAL code generator, but the ORM looks like a more elegant solution
Abstract the sql away 95% of the time so not everyone on the team needs to know how to write super efficient database specific queries.
I think there are a lot of good points here (portability, ease of development/maintenance, focus on OO business modeling etc), but when trying to convince your client or management, it all boils down to how much money you will save by using an ORM.
Do some estimations for typical tasks (or even larger projects that might be coming up) and you'll (hopefully!) get a few arguments for switching that are hard to ignore.
Compilation and testing of queries.
As the tooling for ORM's improves, it is easier to determine the correctness of your queries faster through compile time errors and tests.
Compiling your queries helps helps developers find errors faster. Right? Right. This compilation is made possible because developers are now writing queries in code using their business objects or models instead of just strings of SQL or SQL like statements.
If using the correct data access patterns in .NET it is easy to unit test your query logic against in memory collections. This speeds the execution of your tests because you don't need to access the database, set up data in the database or even spin up a full blown data context.[EDIT]This isn't as true as I thought it was as unit testing in memory can present difficult challenges to overcome. But I still find these integration tests easier to write than in previous years.[/EDIT]
This is definitely more relevant today than a few years ago when the question was asked, but that may only be the case for Visual Studio and Entity Framework where my experience lies. Plugin your own environment if possible.
.net tiers using code smith templates
http://nettiers.com/default.aspx?AspxAutoDetectCookieSupport=1
Why code something that can be generated just as well.
convince them how much time / money you will save when changes come in and you don't have to rewrite your SQL since the ORM tool will do that for you
I think one cons is that ORM will need some updation in your POJO. mainly related to schema, relation and query. so scenario where you are not suppose to make changes in model objects, might be because it is shared among more that on project or b/w client and server. so in such cases you will need to split it in two levels, which will require additional efforts .
i am an android developer and as you know mobile apps are usually not huge in size, so this additional effort to segregate pure-model and orm-affected-model does not seems worth full.
i understand that question is generic one. but mobile apps are also come inside generic umbrella.

To what extent should a developer learn specifics about database systems? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 5 years ago.
Improve this question
Modern database systems today come with loads of features. And you would agree with me that to learn one database you must unlearn the concepts you learned in another database. For example, each database would implement locking differently than others. So to carry the concepts of one database to another would be a recipe for failure. And there could be other examples where two databases would perform very very differently.
So while developing the database driven systems should the programmers need to know the database in detail so that they code for performance? I don't think it would be appropriate to have the DBA called for performance later as his job is to only maintain the database and help out the developer in case of an emergency but not on a regular basis.
What do you think should be the extent the developer needs to gain an insight into the database?
I think these are the most important things (from most important to least, IMO):
SQL (obviously) - It helps to know how to at least do basic queries, aggregates (sum(), etc), and inner joins
Normalization - DB design skills are an major requirement
Locking Model/MVCC - Its nice to have at least a basic grasp of how your databases manage row locking (or use MVCC to accomplish similar goals with optimistic locking)
ACID compliance, Txns - Please know how these work and interact
Indexing - While I don't think that you need to be an expert in tablespaces, placing data on separate drives for optimal performance, and other minutiae, it does help to have a high level knowledge of how index scans work vs. tablescans. It also helps to be able to read a query plan and understand why it might be choosing one over the other.
Basic Tools - You'll probably find yourself wanting to copy production data to a test environment at some point, so knowing the basics of how to restore/backup your database will be important.
Fortunately, there are some great FOSS and free commercial databases out there today that can be used to learn quite a bit about db fundamentals.
I think a developer should have a fairly good grasp of how their database system works, not matter which one it is. When making design and architecture decisions, they need to understand the possible implications when it comes to the database.
Personally, I think you should know how databases work as well as the relational model and the rhetoric behind it, including all forms of normalization (even though I rarely see a need to go beyond third normal form). The core concepts of the relational model do not change from relational database to relational database - implementation may, but so what?
Developers that don't understand the rationale behind database normalization, indexes, etc. are going to suffer if they ever work on a non-trivial project.
I think it really depends on your job. If you are a developer in a large company with dedicated DBAs then maybe you don't need to know much, but if you are in a small company then it may be really helpful knowing more about databases. In small companies you may wear more than one hat.
It cannot hurt to know more in any situation.
It certainly can't hurt to be familiar with relational database theory, and have a good working knowledge of the standard SQL syntax, as well as knowing what stored procedures, triggers, views, and indexes are. Obviously it's not terribly important to learn the database-specific extensions to SQL (T-SQL, PL/SQL, etc) until you start working with that database.
I think it's important to have a basic understand of databses when developing database applications just like it's important to have an understanding of the hardware your your software runs on. You don't have to be an expert, but you shouldn't be totally ignorant of anything your software interacts with.
That said, you probably shouldn't need to do much SQL as an application developer. Most of the interaction with the database should be done through stored procedures developed by the DBA, I'm not a big fan of including SQL code in your application code. If your queries are in stored procedures, then the DBA can change the implementation of the stored procedure, or even the database schema, and so long as the result is the same it doesn't require any changes to your application code.
If you are uncertain about how to best access the database you should be using tried and tested solutions like the application blocks from Microsoft - http://msdn.microsoft.com/en-us/library/cc309504.aspx. They can also prove helpful to you by examining how that code is implemented.
Basic things about Sql queries are must. then you can develop simple system. but when you are going to implement Complex systems you should know Normalization, Procedures, Functions, etc.

Resources